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1 Placing search in context: the concept revisited 

January 2002 ACM Transactions on Information Systems (TOIS), volume 20 issue 1 

Additional Information: full citation , abstract , references , citings , index 
terms , review 



Full text available: 1 



Keyword-based search engines are in widespread use today as a popular means for Web- 
based information retrieval. Although such systems seem deceptively simple, a considerable 
amount of skill is required in order to satisfy non-trivial information needs. This paper 
presents a new conceptual paradigm for performing search in context, that largely 
automates the search process, providing even non-professional users with highly relevant 
results. This paradigm is implemented in practice in the Intelli ... 

Keywords: Search, context, invisible web, semantic processing, statistical natural language 
processing 



2 Placing search in context: the concept revisited 

Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, 
Eytan Ruppin 

April 2001 Proceedings of the 10th international conference on World Wide Web 

Full text available: ^ pdf(235.96 KB) Additional Information: full citation , references , citings , index terms 



Keywords: context, invisible web, search, semantic processing, statistical natural language 
processing 



3 The textual development of non-stereotypic concepts 
Karin Haenelt, Michael K6nyves-T6th 

April 1991 Proceedings of the fifth conference on European chapter of the Association 
for Computational Linguistics 

Full text available: t gl pdf(643.80 KB) 

JsT Additional Information: full citation , abstract , references , citings 

Publisher Site 

In this paper the text theoretical foundation of our text analysis system KONTEXT is 
described. The basic premise of the KONTEXT model is that new concepts are communicated 
by using the mechanisms of text constitution. The text model used assumes that the 
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information conveyed in a textand the information describing its contextual organization 
can be structured into five layers (sentence structure, information on thematic progression, 
referential structure, conceptual representation of the text ... 

4 Japanese OCR error correction using character shape similarity and statistical 
language model 
Masaaki Nagata 
August 1998 

Full text available: ^pdf(686.18 KB) Additional Information: full citation , abstract , references 

We present a novel OCR error correction method for languages without word delimiters that 
have a large character set, such as Japanese and Chinese. It consists of a statistical OCR 
model, an approximate word matching method using character shape similarity, and a word 
segmentation algorithm using a statistical language model. By using a statistical OCR model 
and character shape similarity, the proposed error corrector outperforms the previously 
published method. When the baseline character recog ... 




Content-based retrieval: VideoQA: question answering on news video 
Hui Yang, Lekha Chaisorn, Yunlong Zhao, Shi-Yong Neo, Tat-Seng Chua 
November 2003 Proceedings of the eleventh ACM international conference on 
Multimedia 

Full text available: fg)pdf(592.26 KB) Additional Information: full citation , abstract, references , citings, index 
uz^-* terms 

When querying a news video archive, the users are interested in retrieving precise answers 
in the form of a summary that best answers the query. However, current video retrieval 
systems, including the search engines on the web, are designed to retrieve documents 
instead of precise answers. This research explores the use of question answering (QA) 
techniques to support personalized news video retrieval. Users interact with our system, 
VideoQA, using short natural language questions with implicit ... 

Keywords: transcript error correction, video question answering, video retrieval, video 
summarization 



6 Recognition of the coherence relation between fe-linked clauses 
Akira Oishi, Yuji Matsumoto 
August 1998 

Full text available: P|pdf(643.55 KB) 

jgjT Additional Information: full citation , abstract , references 

Publisher Site 

This paper describes a method for recognizing coherence relations between clauses which 
are linked by te in Japanese-a translational equivalent of English and. We consider that the 
coherence relations are categories each of which has a prototype structure as well as the 
relationships among them. By utilizing this organization of the relations, we can infer an 
appropriate relation from the semantic structures of the clauses between which that relation 
holds. We carried out an experi ... 



7 The FINITE STRING newsletter: Abstracts of current literature 
Computational Linguistics Staff 

July 1984 Computational Linguistics, volume 10 issue 3-4 

Full text available: > P|pdf(2.30 MB) 

Jsf Additional Information: full citation 

^ Publisher Site 
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8 A part of speech estimation method for Japanese unknown words using a statistical 

model of morphology and context 
Masaaki Nagata 

June 1999 Proceedings of the 37th annual meeting of the Association for 
Computational Linguistics on Computational Linguistics 

Full text available: ^pdf(765.00 KB) Additional Information: full citation , abstract , references , citings 

We present a statistical model of Japanese unknown words consisting of a set of length and 
spelling models classified by the character types that constitute a word. The point is quite 
simple: different character sets should be treated differently and the changes between 
character types are very important because Japanese script has both ideograms like Chinese 
<i>(kanji)</i> and phonograms like English <i>(katakana)</i>. Both word segmentation 
accuracy and part of speech taggin ... 

9 Lexicon: Interlingual lexical organisation for multilingual lexical databases in NADIA 
Gilles Serasset 

August 1994 Proceedings of the 15th conference on Computational linguistics - Volume 
1 

Full text available: ^ pdf(442.59 KB) Additional Information: full citation , abstract , references 

We propose a lexical organisation for multilingual lexical databases (MLDB). This 
organisation is based on acceptions (word-senses). We detail this lexical organisation and 
show a mock-up built to experiment with it. We also present our current work in defining 
and prototyping a specialised system for the management of acception-based MLDB. 

Keywords: acception, linguistic structure, multilingual lexical database 



10 CYC. WordNet. and EDR: critiques and responses 
Doug Lenat, George Miller, Toshio Yokoi 

November 1995 Communications of the ACM, volume 38 issue n 

Full text available: * gpdf(106.09 KB) Additional Information: full citation , abstract , citings , index terms 

I applaud Miller's WordNet project and feel that there is much in common in our approaches, 
even though there are fundamental differences in the two expressions of that spirit. Here, I 
list the four differences I noted, closing with a crucial observation concerning the common 
spirit in our work. 

11 Structural analysis of cooking preparation steps in Japanese 
Reiko Hamada, Ichiro Ide, Shuichi Sakai, Hidehiko Tanaka 

November 2000 Proceedings of the fifth international workshop on on Information 
retrieval with Asian languages 

Full text available: gpdf(769.13 KB) Additional Information: full citation , abstract , references 

We propose a method to create process flow graphs automatically from textbooks for 
cooking programs. This is realized by understanding context by narrowing down the domain 
to cooking, and making use of domain specific constraints and knowledge. Since it is 
relatively easy to extract significant keywords from cooking procedures, we create a domain 
specific dictionary by statistical methods, and propose a structural analysis method using 
the dictionary. In order to evaluate the ability of the p ... 

Keywords: cookbooks, domain specific dictionary, preparation steps, structural analysis 
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Elizabeth D. Liddy 

August 1996 Proceedings of the 19th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available: ^ | pdf(288.41 KB) Additional Information: full citation , index terms 



13 Papers: Extraction of lexical translations from non-aligned corpora 
Kumiko Tanaka, Hideya Iwasaki 

August 1996 Proceedings of the 16th conference on Computational linguistics 
2 

Full text available: ^ pdf(571.08 KB) Additional Information: full citation , abstract , references , citings 

A method for extracting lexical translations from non-aligned corpora is proposed to cope 
with the unavailability of large aligned corpus. The assumption that "translations of two co- 
occurring words in a source language also co-occur in the target language" is adopted and 
represented in the stochastic matrix formulation. The translation matrix provides the co- 
occurring information translated from the source into the target. This translated co-occurring 
information should resemble that of the ori ... 

14 Using decision trees to construct a practical parser 
Masahiko Haruno, Satoshi Shirai, Yoshifumi Ooyama 
August 1998 

Full text available: fl|pdff635.46 KB) 

JiT Additional Information: full citation , abstract , references , citings 

Publisher Site 

This paper describes novel and practical Japanese parsers that uses decision trees. First, we 
construct a single decision tree to estimate modification probabilities; how one phrase tends 
to modify another. Next, we introduce a boosting algorithm in which several decision trees 
are constructed and then combined for probability estimation. The two constructed parsers 
are evaluated by using the EDR Japanese annotated corpus. The single-tree method 
outperforms the conventional Japanese stochastic m ... 

15 Application of OODB and SGML techniques in text database: an electronic dictionary 
system 
Jian Zhang 

March 1995 ACM SIGMOD Record, volume 24 issue l 

Full text available: ^pdf(557.23 KB) Additional Information: full citation , abstract , citings , index terms 

An electronic dictionary system (EDS) is developed with object-oriented database techniques 
based on ObjectStore. The EDS is composed of two parts: the Database Building Program 
(DBP), and the Database Querying Program (DQP). DBP reads in a dictionary encoded in 
SGML tags, and builds a database composed of a collection of trees which holds dictionary 
entries, and several lists which contain items of various lexical categories. With text 
exchangeability introduced by the SGML, DBP is able to acco ... 

Keywords: SGML, object-oriented databases, text database 



16 Large-scale resources: The automatic creation of lexical entries for a multilingual MT Q 
system 

David Farwell, Louise Guthrie, Yorick Wilks 

August 1992 Proceedings of the 14th conference on Computational linguistics - Volume 
2 

Full text available: ^pdf(436,57 KB) Additional Information: full citation , abstract , references , citings 



Volume 
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In this paper, we describe a method of extracting information from an on-line resource for 
the construction of lexical entries for a multi-lingual, interlingual MT system (ULTRA). We 
have been able to automatically generate lexical entries for interlingual concepts 
corresponding to nouns, verbs, adjectives and adverbs. Although several features of these 
entries continue to be supplied manually we have greatly decreased the time required to 
generate each entry and see this as a promising method f ... 

17 Selective sampling for example-based word sense disambiguation Q 
Atsushi Fujii, Takenobu Tokunaga, Kentaro Inui, Hozumi Tanaka 

December 1 998 Computational Linguistics, volume 24 issue 4 

Full text available; ^ ffjj] 

TP pqf(1.74 MB) ^ Additional Information: full citation , abstract , references 
Publisher Site 

This paper proposes an efficient example sampling method for example-based word sense 
disambiguation systems. To construct a database of practical size, a considerable overhead 
for manual sense disambiguation (overhead for supervision) is required. In addition, the 
time complexity of searching a large-sized database poses a considerable problem 
(overhead for search). To counter these problems, our method selectively samples a 
smaller-sized effective subset from a given example set for use in wor ... 

18 General-to-specific model selection for subcategorization preference Q 
Takehito Utsuro, Takashi Miyata, Yuji Matsumoto 

August 1998 

Full text available: fg| pdf(715.70 KB) 

(feT Additional Information: full citation , abstract , references , citings 

W Publisher Site 

This paper proposes a novel method for learning probability models of subcategorization 
preference of verbs. We consider the issues of case dependencies and noun class 
generalization in a uniform way by employing the maximum entropy modeling method. We 
also propose a new model selection algorithm which starts from the most general model and 
gradually examines more specific models. In the experimental evaluation, it is shown that 
both of the case dependencies and speci ... 

19 Lexicon: Analysis of scene identification ability of associative memory with pictorial Q 
dictionary 

Tatsuhiko Tsunoda, Hidehiko Tanaka 

August 1994 Proceedings of the 15th conference on Computational linguistics - Volume 
1 

Full text available: ^ pdf(538.99 KB) Additional Information: full citation , abstract , references 

Semantic disambiguation depends on a process of defining the appropriate knowledge 
context. Recent research directions suggest a connectionist approach which use dictionaries, 
but there remain problems of scale, analysis, and interpretation. Here we focus on word 
disambiguation as scene selection, based on the Oxford Pictorial English Dictionary. We 
present a results of a spatial-scene identification ability using our original associative 
memory, We show both theoretical and experimental analysi ... 

20 Poster: Practical world modeling for NLP applications Q 
Lynn Carlson, Sergei Nirenburg 

March 1992 Proceedings of the third conference on Applied natural language 
processing 

Full text available: * Ppdf(210.09 KB) 



Additional Information: full citation , references 
'Publisher Site 



http://portal.acm.org/resultsxfa 7/8/05 



Results (page 1): +"electronic dictionary" and +"context" 



Page 6 of 6 



Results 1 - 20 of 58 Result page: 12 3 next 

The ACM Portal is published by the Association for Computing Machinery. Copyright ? 2005 ACM, Inc. 
Terms of Usage Privacy Policy Code of Ethics Contact Us 

Useful downloads: ^ Adobe Acrobat @ QuickTime H i Windows Media Player ^ > Real Player 



http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFro=49602367 



7/8/05 



Results (page 2): +"electronic dictionary" and +"context" 



Page 1 of 6 




USPTO 



Subscribe (Full Service) Register (Limited Service, Free) Login 

Search: ® The ACM Digital Library G The Guide 
+"electronic dictionary" and + ,, context" 



Terms used electronic dictionary and context 



Sort results 
by 

Display 
results 



' Save results to a Binder 



(relevance gj 

. T — IXI Search Tips 

expanded form ;*S?J i-.^ u 
* r ^ U Open results 

window 



Feedback Report a problem Satisfaction 
survey 

Found 58 of 157,956 

Try an Advanced Search 

Try this search in The ACM Guide 



in a new 



Results 21 - 40 of 58 Result page: previous 12 3 next 

Relevance scale □ Q H B ■ 

21 A stochastic language model using dependency and its improvement by word Q 
clustering 

Shinsuke Mori, Makoto Nagao 
August 1998 

Full text available: | I| pdf(550.06 KB) Additional Information: full citation , abstract , references , citings 

In this paper, we present a stochastic language model for Japanese using dependency. The 
prediction unit in this model is an attribute of "bunsetsu". This is represented by the product 
of the head of content words and that of function words. The relation between the attributes 
of "bunsetsu" is ruled by a context-free grammar. The word sequences are predicted from 
the attribute using word n-gram model. The spell of Unknow word is predicted using 
character n-gram model. This model is robust in tha ... 

22 Algorithms for grapheme-phoneme translation for English and French: applications for Q 
database searches and speech synthesis 

Michel Divay, Anthony J. Vitale 

December 1997 Computational Linguistics, Volume 23 Issue 4 

Full text available: = |3 

■^] pdT(l.9Z MB) Additional Information: full citation , abstract , references , citings 
Publisher Site 

Letter-to -sound rules, also known as grapheme-to-phoneme rules, are important 
computational tools and have been used for a variety of purposes including word or name 
lookups for database searches and speech synthesis/These rules are especially useful when 
integrated into database searches on names and addresses, since they can complement 
orthographic search algorithms that make use of permutation, deletion, and insertion by 
allowing for a comparison with the phonetic equivalent. In databases, ph ... 

23 Automatic extraction of aspectual information from a monolingual corpus Q 
Akira Oishi, Yuji Matsumoto 

July 1997 

Full text available: ' 

Additional Information: full citation , abstract , references 



pdff 712144 KB) 
! Publisher Site 



This paper describes an approach to extract the aspectual information of Japanese verb 
phrases from a monoligual corpus. We classify verbs into six categories by means of the 
aspectual features which are defined on the basis of the possibility of co-occurrence with 
aspectual forms and adverbs. A unique category could be identified for 96% of the target 
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verbs. To evaluate the result of the experiment, we examined the meaning of -teiru which is 
one of the most fundamental aspectual markers ... 

24 Poster Sessions: A tagger/lemmatiser for Dutch medical language 
Peter Spyns 

August 1996 Proceedings of the 16th conference on Computational linguistics 
2 

Full text available: ^pdf(362.08 KB) Additional Information: full citation , abstract , references 

In this paper, we want to describe a tagger/lemmatiser for Dutch medical vocabulary, which 
consists of a full-form dictionary and a morphological recogniser for unknown vocabulary 
coupled to an expert system-like disambiguation module. Attention is also paid to the main 
datastructures: a lexical database and feature bundles implemented as directed acyclic 
graphs, some evaluation results are presented as well. The tagger/lemmatiser currently 
functions as a lexical front-end for a syntactic parser ... 

25 Natural language processing 
. YorickWilks 

January 1996 Communications of the ACM, volume 39 issue l 
Full text available: ^ pdf(238.26 KB) Additional Information: full citation , index terms 



ml 

- Volume 



26 A hybrid Japanese parser with hand-crafted grammar and statistics 
Hiroshi Kanayama, Kentaro Torisawa, Yutaka Mitsuishi, Jun'ichi Tsujii 

July 2000 Proceedings of the 18th conference on Computational linguistics - Volume 1 

Full text available: ^ pdf(680.76 KB) Additional Information: full citation , abstract , references , citings 

This paper describes a hybrid parsing method for Japanese which uses both a hand-crafted 
grammar and a statistical technique. The key feature of our system is that in order to 
estimate likelihood for a parse tree, the system uses information taken from alternative 
partial parse trees generated by the grammar. This utilization of alternative trees enables us 
to construct a new statistical model called Triplet/Quadruplet Model. We show that this 
model can capture a certain tendency in Japan ... 

27 Part-of-speech induction from scratch 
Hinrich Schutze 

June 1993 Proceedings of the 31st annual meeting on Association for Computational 
Linguistics 

Full text available: ff ) pdf(717.90 KB) 

jfejf Additional Information: full citation , abstract , references , citings 

ffi W Publisher Site 

This paper presents a method for inducing the parts of speech of a language and part-of- 
speech labels for individual words from a large text corpus. Vector representations for the 
part-of-speech of a word are formed from entries of its near lexical neighbors. A 
dimensionality reduction creates a space representing the syntactic categories of 
unambiguous words. A neural net trained on these spatial representations classifies 
individual contexts of occurrence of ambiguous words. The method classif ... 

28 Technique for automatically correcting words in text 
Karen Kukich 

December 1992 ACM Computing Surveys (CSUR), Volume 24 Issue 4 

Full text available - fg| pdf(6.23 MB) Additional Information: full citation , abstract , references , citings , index 
^ terms , review 

Research aimed at correcting words in text has focused on three progressively more difficult 
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problems:(l) nonword error detection; (2) isolated-word error correction; and (3) context- 
dependent work correction. In response to the first problem, efficient pattern-matching and 
n-gram analysis techniques have been developed for detecting strings that do not appear in 
a given word list. In response to the second problem, a variety of general and application- 
specific spelling cor ... 

Keywords: n-gram analysis, Optical Character Recognition (OCR), context-dependent 
spelling correction, grammar checking, natural-language-processing models, neural net 
classifiers, spell checking, spelling error detection, spelling error patterns, statistical- 
language models, word recognition and correction 



29 A stochastic finite-state word-segmentation algorithm for Chinese 
Richard Sproat, William Gale, Chilin Shih, Nancy Chang 

September 1996 Computational Linguistics, Volume 22 issue 3 

Full text available:— (f| 

TH pqT(i.9l MB) ^ Additional Information: full citation , abstract , references , citings 
Publisher Site 

The initial stage of text analysis for any NLP task usually involves the tokenization of the 
input into words. For languages like English one can assume, to a first approximation, that 
word boundaries are given by whitespace or punctuation. In various Asian languages, 
including Chinese, on the other hand, whitespace is never used to delimit words, so one 
must resort to lexical information to "reconstruct" the word-boundary information. In this 
paper we present a stochastic finite-state model whe ... 

30 Use WWW resources for translation classes in Taiwan 
Li-yi Huang 

July 1998 ACM SIGCUE Outlook Volume 26 Issue 3 

Full text available: * g pdf(550.78 KB) Additional Information: full citation , abstract , index terms 

In the past, dictionaries, encyclopedia and other reference books were indispensable tools 
for people doing translation. And it has been very common for students in translation 
classes to be handed a long list of text books, all sorts of dictionaries, encyclopedia and 
other reference books. It is almost impossible for the students to spend a lot of money to 
buy them all. Besides, the heavy weight of those reference books, the shortage of space for 
them and the huge amount of time that the studen ... 

31 Association-based natural language processing with neural networks 
Kimura Kazuhiro, Suzuoka Takashi, Amano Sin-ya 

June 1992 Proceedings of the 30th annual meeting on Association for Computational 
Linguistics 

Full text available: ^ pdf(450.05 KB) 

Additional Information: full citation , abstract , references 

Publisher Site 



This paper describes a natural language processing system reinforced by the use of 
association of words and concepts, implemented as a neural network. Combining an 
associative network with a conventional system contributes to semantic disambiguation in 
the process of interpretation. The model is employed within a kana-kanji conversion system 
and the advantages over conventional ones are shown. 

32 A unification-based approach to morpho-svntactic parsing of agglutinative and other 
(highly) inflectional languages 
G^bor Proszeky, Bal^zs Kis 

June 1999 Proceedings of the 37th annual meeting of the Association for 
Computational Linguistics on Computational Linguistics 
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Full text available: 




idf(605.73 KB) Additional Information: full citation , abstract , references 



This paper introduces a new approach to morpho-syntactic analysis through Humor 99 
(<u>H</u>igh-speed <u>U</u>nification <u>Mor</u>phology), a reversible and 
unification-based morphological analyzer which has already been integrated with a variety of 
industrial applications. Humor 99 successfully copes with problems of agglutinative (e.g. 
Hungarian, Turkish, Estonian) and other (highly) inflectional languages (e.g. Polish, Czech, 
German) very effectively. T ... 

33 Special issue on using large corpora: I: Introduction to the special issue on 
computational linguistics using large corpora 
Kenneth W. Church, Robert L. Mercer 
March 1993 Computational Linguistics, volume 19 issue l 



34 Poster papers: Discovering word senses from text 
Patrick Pantel, Dekang Lin 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^pdf(661.99 KB) Additional Information: full citation , abstract , references , index terms 

Inventories of manually compiled dictionaries usually serve as a source for word senses. 
However, they often include many rare senses while missing corpus/domain-specific senses. 
We present a clustering algorithm called CBC (Clustering By Committee) that automatically 
discovers word senses from text. It initially discovers a set of tight clusters called 
committees that are well scattered in the similarity space. The centroid of the members of a 
committee is used as the feature vector of the clus ... 

Keywords: clustering, evaluation, machine learning, word sense discovery 



35 Poster Sessions: Word extraction from corpora and its part-of-speech estimation using Q 
distributional analysis 
Shinsuke Mori, Makoto Nagao 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 



Full text available: ^ pdf(360.98 KB) Additional Information: full citation , abstract , references , citings 



Unknown words are inevitable at any step of analysis in natural language processing. We 
propose a method to extract words from a corpus and estimate the probability that each 
word belongs to given parts of speech (POSs), using a distributional analysis. Our 
experiments have shown that this method is effective for inferring the POS of unknown 
words. 

Lexicon: Noun phrasal entries in the EDR English word dictionary 
A. Koizumi, M. Arioka, C. Harada, M. Sugimoto, L. Guthrie, C. Watts, R. Catizone, Y. Wilks 
August 1994 Proceedings of the 15th conference on Computational linguistics - Volume 
1 

Full text available: ^ pdff 480.83 KB) Additional Information: full citation , references 
Keywords: lexicon Construction, resources for CL, universal features 
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37 Query term disambiguation for Web cross-language information retrieval using a 
search engine 

Akira Maeda, Fatiha Sadat, Masatoshi Yoshikawa, Shunsuke Uemura 
November 2000 Proceedings of the fifth international workshop on on Information 
retrieval with Asian languages 

Full text available: ^ pdf(736.31 KB) Additional Information: full citation , abstract , references , ci tings 

With the worldwide growth of the Internet, research on Cross-Language Information 
Retrieval (CLIR) is being paid much attention. Existing CUR approaches based on query 
translation require parallel corpora or comparable corpora for the disambiguation of 
translated query terms. However, those natural language resources are not readily 
available. In this paper, we propose a disambiguation method for dictionary-based query 
translation that is independent of the availability of such scarce langua ... 

Keywords: WWW, cross-language information retrieval, mutual information, search engine 
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38 Papers: Inherited Feature-based Similarity Measure based on large semantic hierarchy Q 
and large text corpus 

Hideki Hirakawa, Zhonghui Xu, Kenneth Haase 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
1 

Full text available: ^ | pdf(623.25 KB) Additional Information: full citation , abstract , references , citings 

We describe a similarity calculation model called IFSM (Inherited Feature Similarity 
Measure) between objects (words/concepts) based on their common and distinctive 
features. We propose an implementation method for obtaining features based on abstracted 
triples extracted from a large text corpus utilizing taxonomical knowledge. This model 
represents an integration of traditional methods, i.e., relation based similarity measure and 
distribution based similarity measure. An experiment, using our n ... 



39 A rule-based hyphenator for Modern Greek 
Theodora I. Noussia 

September 1997 Computational Linguistics, volume 23 issue 3 

Full text available: = S 

k|jpat(1.16 MB ) ^ Additional Information: full citation , abstract , references 
Publisher Site 

The purpose of this paper is to formally examine hyphenation as it pertains to Modern Greek 
with the aim of achieving accurate and thorough machine hyphenation. Grammar rules are 
interpreted and formally expressed in terms of regular expressions of word substrings, and 
exact hyphenation rules are derived. Vowel splitting, which traditionally is indicated in terms 
of prohibitive rather than explicit grammar rules, is examined in detail. Many ambiguities 
caused by circular definitions of the prohi ... 



Large-scale resources: A Chinese corpus for linguistic research 
Chu-Ren Huang, Keh-jiann Chen 

August 1992 Proceedings of the 14th conference on Computational linguistics - Volume 
4 

Full text available: ^ pdf(259.12 KB) Additional Information: full citation , abstract , references , citings 

This is a project note on the first stage of the construction of a comprehensive corpus of 
both Modern and Classical Chinese. The corpus is built with the dual aim of serving as the 
central database for Chinese language processing and for supporting in-depth linguistic 
research in Mandarin Chinese. 
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41 Structural disambiguation based on reliable estimation of strength of association Q 
Haodong Wu, Eduardo de Paiva Alves, Teiji Furugori 
August 1998 



Full text available: W pdf(574.67 KB) 
1 Publisher Site 



Additional Information: full citation, abstract, references 



This paper proposes a new class-based method to estimate the strength of association in 
word co-occurrence for the purpose of structural disambiguation. To deal with sparseness of 
data, we use a conceptual dictionary as the source for acquiring upper classes of the words 
related in the co-occurrence, and then use t-scores to determine a pair of classes to be 
employed for calculating the strength of association. We have applied our method to 
determining dependency relations in Japanese and prepos ... 

42 An intelligent multi-dictionary environment 
Gcibor Proszeky 

August 1998 

pdf(684.82 KB) 

^Additional Information: full citation , abstract , references 
Publisher Site 

An open, extendible multi-dictionary system is introduced in the paper. It supports the 
translator in accessing adequate entries of various bi- and monolingual dictionaries and 
translation examples from parallel corpora. Simultaneously an unlimited number of 
dictionaries can be held open, thus by a single interrogation step, all the dictionaries 
(translations, explanations, synonyms, etc.) can be surveyed. The implemented system 
(called MoBiDic) knows morphological rules of the dictionaries' Ian ... 

43 Developing hypertext documents for an international audience 
Elizabeth S. Spragins 

November 1992 Proceedings of the 10th annual international conference on Systems 
documentation 

Full text available: f£\ pdf(795.30 KB) Additional Information: full citation , references , index terms 



44 Use of mutual information based character clusters in dictionary-less morphological 
analysis of Japanese 

Hideki Kashioka, Yasuhiro Kawata, Yumiko Kinjo, Andrew Finch, Ezra W. Black 
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August 1998 

Full text available: P| pdf(469.55 KB) 

JiT Additional Information: full citation , abstract , references , citings 

tB h Publisher Site 

For languages whose character set is very large and whose orthography does not require 
spacing between words, such as Japanese, tokenizing and part-of-speech tagging are often 
the difficult parts of any morphological analysis. For practical systems to tackle this problem, 
uncontrolled heuristics are primarily used. The use of information on character sorts, 
however, mitigates this difficulty. This paper presents our method of incorporating character 
clustering based on mutual information into De ... 

45 Learning probabilistic subcategorization preference by identifying case dependencies Q 
and optimal noun class generalization level 

Takehito Utsuro, Yuji Matsumoto 

March 1997 Proceedings of the fifth conference on Applied natural language 
processing 

Full text available: f§ pdf(844.41 KB) 

raT Additional Information: full citation , abstract , references , citings 

W Publisher Site 

This paper proposes a novel method of learning probabilistic subcategorization preference. 
In the method, for the purpose of coping with the ambiguities of case dependencies and 
noun class generalization of argument/adjunct nouns, we introduce a data structure which 
represents a tuple of independent partial subcategorization frames. Each collocation of a 
verb and argument/adjunct nouns is assumed to be generated from one of the possible 
tuples of independent partial subcategorization frames. Par ... 

46 Estimating understandability of software documents Q 
Kari Laitinen 

July 1996 ACM SIGSOFT Software Engineering Notes, volume 21 issue 4 

Full text available: Q pdf(878.94 KB) Additional Information: full citation , abstract , index terms 

Software developers and maintainers need to read and understand source programs and 
other kinds of software documents in their work. Understandability of software documents is 
thus important. This paper introduces a method for estimating the understandability of 
software documents. The method is based on a language theory according to which every 
software document is considered to contain a language of its own, which is a set of symbols. 
The understandability of documents written according to di ... 

47 HCI in the developing world: Enabling computer interaction in the indigenous Q 

languages of South Africa: the central role of computational morphology 
Laurette Pretorius, Sonja E. Bosch 
March 2003 interactions, Volume 10 issue 2 

Full text available: H I pdf(341. 79 KB) AJJ . A . ^ £ „ . . 

u* ./on ,/n Additional Information: full citation , references , index terms 
ntml(29.72 KB) 



48 Helping users navigate in multimedia documents: the affective domain 
Marcia Peoples Halio 

November 1992 Proceedings of the 10th annual international conference on Systems 
documentation 

Full text available: * P| pdf(512.62 KB) Additional Information: full citation , references , citings , index terms 
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49 Subverting Structure: Data-Driven Diagram Generation 
Gene Golovchinsky, Klaus Reichenberger, Thomas Kamps 
October 1995 Proceedings of the 6th conference on Visualization '95 

Full text available: 1§ ) pdf( 848.62 KB) 

JiT Additional Information: full citation , abstract 

W Publisher Site 

Diagrams are data representations that convey information predominantly through 
combinations of graphical elements rather than through other channels such as text or 
interaction. We have implemented a prototype called AVE (Automatic Visualization 
Environment) that generates diagrams automatically based on a generative theory of 
diagram design. According to this theory, diagrams are constructed based on the data to be 
visualized rather than by selection from a predefined set of diagrams. This app ... 

so Computational methods ("paradigms"): The typology of unknown words: an 
experimental study of two corpora 
Xiaobo Ren, Frangois Perrault 

August 1992 Proceedings of the 14th conference on Computational linguistics - Volume 
1 

Full text available: pdf(464.19 KB) Additional Information: full citation , references 



51 Performance systems technology (PST) and computer-based instruction (CBD: tools Q 
for instructional designers in the 21st Century, part II 
Gloria A. Reece 

November 1997 ACM SIGDOC Asterisk Journal of Computer Documentation, Volume 21 

Issue 4 

Full text available: fp pdf(304.72 KB) Additional Information: full citation , index terms 



52 Panel 4: The Internet a "natural" channel for language learning Q 
Inui Kentaro 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
2 

Full text available: f£) pdf( 126.67 KB) Additional Information: full citation 



53 Short Papers: Ontology modeling tool with concept dictionary 
Yoichi Hiramatsu, Seiji Koide 

January 2004 Proceedings of the 9th international conference on Intelligent user 
interface 

Full text available: | £|pdf( 196.04 KB) Additional Information: full citation , abstract , references , index terms 

The usefulness of ontology is strongly dependent on the knowledge representation policy 
and its maintenance. The subject of knowledge representation and modeling tool has been 
one of the exciting themes among ontology scientists. Some ontology editing tools were 
born and grew up in the field of expert system and others designed originally by ontology 
research groups. Key features of the newly implemented tool are: reference to the concept 
dictionary to find out semantics of the words, and use of ... 

Keywords: concept dictionary, editing tool, inference, ontology modeling, web service 
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relational databases 

August 2001 ACM Transactions on Internet Technology (TOIT), volume l issue l 

Full text available- I f) pdf(264 27 KB) Additional Information: full citation , abstract , references , citings, index 
' ! terms , review 

This article describes XRel, a novel approach for storage and retrieval of XML documents 
using relational databases. In this approach, an XML document is decomposed into nodes on 
the basis of its tree structure and stored in relational tables according to the node type, with 
path information from the root to each node. XRel enables us to store XML documents using 
a fixed relational schema without any information about DTDs and also to utilize indices 
such as the B+ 

Keywords: XML query, XPath, text markup, text tagging 



55 New horizons in commercial and industrial Al 
Toshinori Munakata 

November 1995 Communications of the ACM, volume 38 issue n 
Full text available: * ^ pdf(400.28 KB) Additional Information: full citation , abstract , index terms 

AI as a field has undergone rapid growth in diversification and practicality. For the past 10 
years, the repertoire of AI techniques has evolved and expanded. Scores of newer fields 
have recently been added to the traditional domains of practical AI. Although much practical 
AI is still best characterized as advanced computing rather than intelligence, applications in 
everyday commercial and industrial settings have certainly increased, especially since 1990. 
Additionally, A ... 




56 Cross-language Information Retrieval: Comparing cross-language query expansion 
techniques by degrading translation resources 
Paul McNamee, James Mayfield 

August 2002 Proceedings of the 25th annual international ACM SIGIR conference on 
Research and development in Information retrieval 

Full text available ^ pdf(267 21 KB) Additional Information: full citation , abstract , references , citings , index 

: terms 

The quality of translation resources is arguably the most important factor affecting the 
performance of a cross-language information retrieval system. While many investigations 
have explored the use of query expansion techniques to combat errors induced by 
translation, no study has yet examined the effectiveness of these techniques across 
resources of varying quality. This paper presents results using parallel corpora and bilingual 
wordlists that have been deliberately degraded prior to query tr ... 

Keywords: cross-language information retrieval, query expansion, query translation, 
translation resources 



57 Anchor text mining for translation of Web queries: A transitive translation approach Q 
Wen-Hsiang Lu, Lee-Feng Chien, Hsi-Jian Lee 

April 2004 ACM Transactions on Information Systems (TOIS), Volume 22 issue 2 

Full text available: f ^pdf(280.55 KB) Additional Information: full citation , abstract , references , citings , index 
t*^^ terms 

To discover translation knowledge in diverse data resources on the Web, this article 
proposes an effective approach to finding translation equivalents of query terms and 
constructing multilingual lexicons through the mining of Web anchor texts and link 
structures. Although Web anchor texts are wide-scoped hypertext resources, not every 
particular pair of languages contains sufficient anchor texts for effective extraction of 
translations for Web queries. For more generalized applications, the app ... 
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Keywords: Multilingual translation, anchor text mining, competitive linking algorithm, 
cross-language Web search, cross-language information retrieval 



58 Electronic component information exchange (EC IX) 
Donald R. Cottrell 

June 1997 Proceedings of the 34th annual conference on Design automation - Volume 
00 

Full text available: ^ pdf(70.34 KB) Additional Information: futl citation , abstract , references , ci tings , index 
Publisher Site feUDS 

A number of industry trends are shaping the requirements for ICand electronic equipment 
design. The density and complexity ofcircuit technologies have increased to a point where 
design cannotbe performed without EDA tools. The availability of completelydesigned and 
verified reusable design components has become amajor impediment to meeting required 
design productivity goals. Design reuse is moving down the package hierarchy to includechip 
design in addition to PCA design. At the same time, the wid ... 
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1 Technique for automatically correcting words in text 
Karen Kukich 

December 1992 ACM Computing Surveys (CSUR), volume 24 issue 4 

Additional Information: full citation , abstract , references , citings , index 
terms , review 



Full text available: ^ pdf(6.23 MB) 



Research aimed at correcting words in text has focused on three progressively more difficult 
problems:(l) nonword error detection; (2) isolated-word error correction; and (3) context- 
dependent work correction. In response to the first problem, efficient pattern-matching and 
n-gram analysis techniques have been developed for detecting strings that do not appear in 
a given word list. In response to the second problem, a variety of general and application- 
specific spelling cor ... 

Keywords: n-gram analysis, Optical Character Recognition (OCR), context-dependent 
spelling correction, grammar checking, natural-language-processing models, neural net 
classifiers, spell checking, spelling error detection, spelling error patterns, statistical- 
language models, word recognition and correction 



Placing search in context: the concept revisited 

Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, 
Eytan Ruppin 

April 2001 Proceedings of the 10th international conference on World Wide Web 

Full text available: ^ pdf(235.96 KB) Additional Information: full citation , references , citings , index terms 



Keywords: context, invisible web, search, semantic processing, statistical natural language 
processing 



3 Placing search in context: the concept revisited 

January 2002 ACM Transactions on Information Systems (TOIS), Volume 20 Issue 1 

Full text available* f £| pdf(926 20 KB) Additional Information: full citation , abstract , references , citings , index 

terms , review 

Keyword-based search engines are in widespread use today as a popular means for Web- 
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based information retrieval. Although such systems seem deceptively simple, a considerable 
amount of skill is required in order to satisfy non-trivial information needs. This paper 
presents a new conceptual paradigm for performing search in context, that largely 
automates the search process, providing even non-professional users with highly relevant 
results. This paradigm is implemented in practice in the Intelli ... 

Keywords: Search, context, invisible web, semantic processing, statistical natural language 
processing 



Content-based retrieval: VideoQA: question answering on news video 
Hui Yang, Lekha Chaisorn, Yunlong Zhao, Shi-Yong Neo, Tat-Seng Chua 
November 2003 Proceedings of the eleventh ACM international conference on 
Multimedia 

Full text available: fl l pdf(592 26 KB) Additional Information: full citation , abstract , references , citings , index 

: terms 

When querying a news video archive, the users are interested in retrieving precise answers 
in the form of a summary that best answers the query. However, current video retrieval 
systems, including the search engines on the web, are designed to retrieve documents 
instead of precise answers. This research explores the use of question answering (QA) 
techniques to support personalized news video retrieval. Users interact with our system, 
VideoQA, using short natural language questions with implicit ... 

Keywords: transcript error correction, video question answering, video retrieval, video 
summarization 



Selective sampling for example-based word sense disambiguation 
Atsushi Fujii, Takenobu Tokunaga, Kentaro Inui, Hozumi Tanaka 
December 1998 Computational Linguistics, Volume 24 Issue 4 

Full text available:^ nfjj] 

H| pdf(1.74 MB) ^ Additional Information: full citation , abstract , references 
Publisher Site 

This paper proposes an efficient example sampling method for example-based word sense 
disambiguation systems. To construct a database of practical size, a considerable overhead 
for manual sense disambiguation (overhead for supervision) is required. In addition, the 
time complexity of searching a large-sized database poses a considerable problem 
(overhead for search). To counter these problems, our method selectively samples a 
smaller-sized effective subset from a given example set for use in wor ... 



6 Algorithms for grapheme-phoneme translation for English and French: applications for Q 
database searches and speech synthesis 
Michel Divay, Anthony J. Vitale 

December 1997 Computational Linguistics, Volume 23 issue 4 

Full text available:^ - M 

Tl] pclT(l.9Z MB ) ^ Additional Information: full citation , abstract , references , citings 
Publisher Site 

Letter-to-sound rules, also known as grapheme-to-phqneme rules, are important 
computational tools and have been used for a variety of purposes including word or name 
lookups for database searches and speech synthesis.These rules are especially useful when 
integrated into database searches on names and addresses, since they can complement 
orthographic search algorithms that make use of permutation, deletion, and insertion by 
allowing for a comparison with the phonetic equivalent. In databases, ph ... 
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Poster Sessions: A tagger/lemmatiser for Dutch medical language U 
Peter Spyns 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
2 

Full text available: ^)pdf( 362.08 KB) Additional Information: full citation , abstract , references 

In this paper, we want to describe a tagger/lemmatiser for Dutch medical vocabulary, which 
consists of a full-form dictionary and a morphological recogniser for unknown vocabulary 
coupled to an expert system-like disambiguation module. Attention is also paid to the main 
datastructures: a lexical database and feature bundles implemented as directed acyclic 
graphs, some evaluation results are presented as well. The tagger/lemmatiser currently 
functions as a lexical front-end for a syntactic parser ... 

8 Poster Sessions: Word extraction from corpora and its part-of-speech estimation using Q 

distributional analysis 
Shinsuke Mori, Makoto Nagao 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
2 

Full text available; ^pdf(360.98 KB) Additional Information: full citation , abstract , references , citings 

Unknown words are inevitable at any step of analysis in natural language processing. We 
propose a method to extract words from a corpus and estimate the probability that each 
word belongs to given parts of speech (POSs), using a distributional analysis. Our 
experiments have shown that this method is effective for inferring the POS of unknown 
words. 

9 Cross-language Information Retrieval: Comparing cross-language query expansion Q 

techniques by degrading translation resources 
Paul McNamee, James Mayfield 

August 2002 Proceedings of the 25th annual international ACM SIGIR conference on 
Research and development in information retrieval 

Full text available' fj £| pdf(267 21 KB) Additional Information: full citation , abstract , references , citings , index 

: terms 

The quality of translation resources is arguably the most important factor affecting the 
performance of a cross-language information retrieval system. While many investigations 
have explored the use of query expansion techniques to combat errors induced by 
translation, no study has yet examined the effectiveness of these techniques across 
resources of varying quality. This paper presents results using parallel corpora and bilingual 
wordlists that have been deliberately degraded prior to query tr ... 

Keywords: cross-language information retrieval, query expansion, query translation, 
translation resources 



10 A part of speech estimation method for Japanese unknown words using a statistical 
model of morphology and context 
Masaaki Nagata 

June 1999 Proceedings of the 37th annual meeting of the Association for 
Computational Linguistics on Computational Linguistics 

Full text available: | £ | pdf(765.00 KB) Additional Information: full citation , abstract , references , citings 

We present a statistical model of Japanese unknown words consisting of a set of length and 
spelling models classified by the character types that constitute a word. The point is quite 
simple: different character sets should be treated differently and the changes between 
character types are very important because Japanese script has both ideograms like Chinese 
<i>(kanji)</i> and phonograms like English <i>(katakana)</i>. Both word segmentation 
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accuracy and part of speech taggin ... 

11 The textual development of non-stereotypic concepts 
Karin Haenelt, Michael Konyves-Toth 

April 1991 Proceedings of the fifth conference on European chapter of the Association 
for Computational Linguistics 

Full text available: f g pdf(643.80 KB) 

MT Additional Information: full citation , abstract , references , citings 

Publisher Site 



In this paper the text theoretical foundation of our text analysis system KONTEXT is 
described. The basic premise of the KONTEXT model is that new concepts are communicated 
by using the mechanisms of text constitution. The text model used assumes that the 
information conveyed in a text and the information describing its contextual organization 
can be structured into five layers (sentence structure, information on thematic progression, 
referential structure, conceptual representation of the text ... 

12 The FINITE STRING newsletter: Abstracts of current literature 
Computational Linguistics Staff 

July 1984 Computational Linguistics, volume 10 issue 3-4 

Full text available: f § pdf(2.30 MB) 

jfejT ^ Additional Information: full citation 

ffifl' Publisher Site 



Structural analysis of cooking preparation steps in Japanese 
Reiko Hamada, Ichiro Ide, Shuichi Sakai, Hidehiko Tanaka 

November 2000 Proceedings of the fifth international workshop on on Information 
retrieval with Asian languages 

Full text available: * ^pdf(769.13 KB) Additional Information: full citation , abstract , references 

We propose a method to create process flow graphs automatically from textbooks for 
cooking programs. This is realized by understanding context by narrowing down the domain 
to cooking, and making use of domain specific constraints and knowledge. Since it is 
relatively easy to extract significant keywords from cooking procedures, we create a domain 
specific dictionary by statistical methods, and propose a structural analysis method using 
the dictionary. In order to evaluate the ability of the p ... 

Keywords: cookbooks, domain specific dictionary, preparation steps, structural analysis 
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14 Special issue on using large corpora: I: Introduction to the special issue on 
computational linguistics using large corpora 
Kenneth W. Church, Robert L. Mercer 
March 1993 Computational Linguistics, volume 19 issue l 

Full text available: ^ rf| 

HH pdf(1.80 MB) ^ Additional Information: full citation , references , citings 
Publisher Site 



15 A stochastic finite-state word-segmentation algorithm for Chinese 
Richard Sproat, William Gale, Chilin Shih, Nancy Ghang 
September 1996 Computational Linguistics, volume 22 issue 3 

Full text available:^ 

Tg]pdf(1.91 MB) ^ Additional Information: full citation , abstract , references , citings 
Publisher Site 
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The initial stage of text analysis for any NLP task usually involves the tokenization of the 
input into words. For languages like English one can assume, to a first approximation, that 
word boundaries are given by whitespace or punctuation. In various Asian languages, 
including Chinese, on the other hand, whitespace is never used to delimit words, so one 
must resort to lexical information to "reconstruct" the word-boundary information. In this 
paper we present a stochastic finite-state model whe ... 

16 Japanese OCR error correction using character shape similarity and statistical 

language model 
Masaaki Nagata 
August 1998 

Full text available: g )pdf(686.18 KB) Additional Information: full citation , abstract , references 

We present a novel OCR error correction method for languages without word delimiters that 
have a large character set, such as Japanese and Chinese. It consists of a statistical OCR 
model, an approximate word matching method using character shape similarity, and a word 
segmentation algorithm using a statistical language model. By using a statistical OCR model 
and character shape similarity, the proposed error corrector outperforms the previously 
published method. When the baseline character recog ... 

17 Using decision trees to construct a practical parser 
Masahiko Haruno, Satoshi Shirai, Yoshifumi Ooyama 
August 1998 

Full text available: * P pdf(635.46 KB) 

|fejT Additional Information: full citation , abstract , references , citings 

Publisher Site 

This paper describes novel and practical Japanese parsers that uses decision trees. First, we 
construct a single decision tree to estimate modification probabilities; how one phrase tends 
to modify another. Next, we introduce a boosting algorithm in which several decision trees 
are constructed and then combined for probability estimation. The two constructed parsers 
are evaluated by using the EDR Japanese annotated corpus. The single-tree method 
outperforms the conventional Japanese stochastic m ... 



u 



CYC, WordNet. and EDR: critiques and responses 
Doug Lenat, George Miller, Toshio Yokoi 
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I applaud Miller's WordNet project and feel that there is much in common in our approaches, 
even though there are fundamental differences in the two expressions of that spirit. Here, I 
list the four differences I noted, closing with a crucial observation concerning the common 
spirit in our work. 



19 A stochastic language model using dependency and its improvement bv word 
clustering 

Shinsuke Mori, Makoto Nagao 
August 1998 
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In this paper, we present a stochastic language model for Japanese using dependency. The 
prediction unit in this model is an attribute of "bunsetsu". This is represented by the product 
of the head of content words and that of function words. The relation between the attributes 
of "bunsetsu" is ruled by a context-free grammar. The word sequences are predicted from 
the attribute using word n-gram model. The spell of Unknow word is predicted using 
character n-gram model. This model is robust in tha ... 
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20 Automatic extraction of aspectual information from a monolingual corpus 
Akira Oishi, Yuji Matsumoto 
July 1997 
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This paper describes an approach to extract the aspectual information of Japanese verb 
phrases from a monoligual corpus. We classify verbs into six categories by means of the 
aspectual features which are defined on the basis of the possibility of co-occurrence with 
aspectual forms and adverbs. A unique category could be identified for 96% of the target 
verbs. To evaluate the result of the experiment, we examined the meaning of -teiru which is 
one of the most fundamental aspectual markers ... 
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1 Technique for automatically correcting words in text Q 
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Research aimed at correcting words in text has focused on three progressively more difficult 
problems:(l) nonword error detection; (2) isolated-word error correction; and (3) context- 
dependent work correction. In response to the first problem, efficient pattern-matching and 
n-gram analysis techniques have been developed for detecting strings that do not appear in 
a given word list. In response to the second problem, a variety of general and application- 
specific spelling cor ... 

Keywords: n-gram analysis, Optical Character Recognition (OCR), context-dependent 
spelling correction, grammar checking, natural-language-processing models, neural net 
classifiers, spell checking, spelling error detection, spelling error patterns, statistical- 
language models, word recognition and correction 
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This paper proposes an efficient example sampling method for example-based word sense 
disambiguation systems. To construct a database of practical size, a considerable overhead 
for manual sense disambiguation (overhead for supervision) is required. In addition, the 
time complexity of searching a large-sized database poses a considerable problem 
(overhead for search). To counter these problems, our method selectively samples a 
smaller-sized effective subset from a given example set for use in wor ... 

Content-based retrieval: VideoQA: question answering on news video 
Hui Yang, Lekha Chaisorn, Yunlong Zhao, Shi-Yong Neo, Tat-Seng Chua 
November 2003 Proceedings of the eleventh ACM international conference on 
Multimedia 

Full text available* Hi Ddf(592 26 KB) Additional Information: full citation , abstract , references , citings , index 
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When querying a news video archive, the users are interested in retrieving precise answers 
in the form of a summary that best answers the query. However, current video retrieval 
systems, including the search engines on the web, are designed to retrieve documents 
instead of precise answers. This research explores the use of question answering (QA) 
techniques to support personalized news video retrieval. Users interact with our system, 
VideoQA, using short natural language questions with implicit ... 

Keywords: transcript error correction, video question answering, video retrieval, video 
summarization 
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January 2002 ACM Transactions on Information Systems (TOIS), Volume 20 Issue 1 

Full text available' W\ pdf(926 20 KB) Additiona| Information: full citation , abstract , references , citings , index 
^ ' terms , review 

Keyword-based search engines are in widespread use today as a popular means for Web- 
based information retrieval. Although such systems seem deceptively simple, a considerable 
amount of skill is required in order to satisfy non-trivial information needs. This paper 
presents a new conceptual paradigm for performing search in context, that largely 
automates the search process, providing even non-professional users with highly relevant 
results. This paradigm is implemented in practice in the Intelli ... 

Keywords: Search, context, invisible web, semantic processing, statistical natural language 
processing 
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The initial stage of text analysis for any NLP task usually involves the tokenization of the 
input into words. For languages like English one can assume, to a first approximation, that 
word boundaries are given by whitespace or punctuation. In various Asian languages, 
including Chinese, on the other hand, whitespace is never used to delimit words, so one 
must resort to lexical information to "reconstruct" the word-boundary information. In this 
paper we present a stochastic finite-state model whe ... 

8 Automatic extraction of aspectual information from a monolingual corpus 
Akira Oishi, Yuji Matsumoto 

July 1997 
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This paper describes an approach to extract the aspectual information of Japanese verb 
phrases from a monoligual corpus. We classify verbs into six categories by means of the 
aspectual features which are defined on the basis of the possibility of co-occurrence with 
aspectual forms and adverbs. A unique category could be identified for 96% of the target 
verbs. To evaluate the result of the experiment, we examined the meaning of -teiru which is 
one of the most fundamental aspectual markers ... 

9 Placing search in context: the concept revisited 

Lev Finkelstein, Evgeniy Gabrilovich, Yossi Matias, Ehud Rivlin, Zach Solan, Gadi Wolfman, 
Eytan Ruppin 

April 2001 Proceedings of the 10th international conference on World Wide Web 
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10 A part of speech estimation method for Japanese unknown words using a statistical 
model of morphology and context 

MasaakiNagata 

June 1999 Proceedings of the 37th annual meeting of the Association for 
Computational Linguistics on Computational Linguistics 

Full text available: pdf(765.00 KB) Additional Information: full citation , abstract , references , ci tings 

We present a statistical model of Japanese unknown words consisting of a set of length and 
spelling models classified by the character types that constitute a word. The point is quite 
simple: different character sets should be treated differently and the changes between 
character types are very important because Japanese script has both ideograms like Chinese 
<i>(kanji)</i> and phonograms like English <i>(katakana)</i>. Both word segmentation 
accuracy and part of speech taggin ... 

11 Using decision trees to construct a practical parser 
Masahiko Haruno, Satoshi Shirai, Yoshifumi Ooyama 
August 1998 
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This paper describes novel and practical Japanese parsers that uses decision trees. First, we 
construct a single decision tree to estimate modification probabilities; how one phrase tends 
to modify another. Next, we introduce a boosting algorithm in which several decision trees 
are constructed and then combined for probability estimation. The two constructed parsers 
are evaluated by using the EDR Japanese annotated corpus. The single-tree method 
outperforms the conventional Japanese stochastic m ... 

12 Structural analysis of cooking preparation steps in Japanese 
Reiko Hamada, Ichiro Ide, Shuichi Sakai, Hidehiko Tanaka 

November 2000 Proceedings of the fifth international workshop on on Information 
retrieval with Asian languages 

Full text available: ^ ppdf(769.13 KB) Additional Information: full citation , abstract , references 

We propose a method to create process flow graphs automatically from textbooks for 
cooking programs. This is realized by understanding context by narrowing down the domain 
to cooking, and making use of domain specific constraints and knowledge. Since it is 
relatively easy to extract significant keywords from cooking procedures, we create a domain 
specific dictionary by statistical methods, and propose a structural analysis method using 
the dictionary. In order to evaluate the ability of the p ... 

Keywords: cookbooks, domain specific dictionary, preparation steps, structural analysis 



13 A stochastic language model using dependency and its improvement by word 
clustering 

Shinsuke Mori, Makoto IMagao 
August 1998 
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In this paper, we present a stochastic language model for Japanese using dependency. The 
prediction unit in this model is an attribute of "bunsetsu". This is represented by the product 
of the head of content words and that of function words. The relation between the attributes 
of "bunsetsu" is ruled by a context-free grammar. The word sequences are predicted from 
the attribute using word n-gram model. The spell of Unknow word is predicted using 
character n-gram model. This model is robust in tha ... 

14 Special issue on using large corpora: I: Introduction to the special issue on 
computational linguistics using large corpora 

Kenneth W. Church, Robert L. Mercer 

March 1993 Computational Linguistics, volume 19 issue l 
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In this paper the text theoretical foundation of our text analysis system KONTEXT is 
described. The basic premise of the KONTEXT model is that new concepts are communicated 
by using the mechanisms of text constitution. The text model used assumes that the 
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information conveyed in a text and the information describing its contextual organization 
can be structured into five layers (sentence structure, information on thematic progression, 
referential structure, conceptual representation of the text ... 

16 General-to-specific model selection for subcategorization preference 
Takehito Utsuro, Takashi Miyata, Yuji Matsumoto 
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This paper proposes a novel method for learning probability models of subcategorization 
preference of verbs. We consider the issues of case dependencies and noun class 
generalization in a uniform way by employing the maximum entropy modeling method. We 
also propose a new model selection algorithm which starts from the most general model and 
gradually examines more specific models. In the experimental evaluation, it is shown that 
both of the case dependencies and speci ... 
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17 Lexicon: Analysis of scene identification ability of associative memory with pictorial 
dictionary 

Tatsuhiko Tsunoda, Hidehiko Tanaka 

August 1994 Proceedings of the 15th conference on Computational linguistics - Volume 
1 
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Semantic disambiguation depends on a process of defining the appropriate knowledge 
context. Recent research directions suggest a connectionist approach which use dictionaries, 
but there remain problems of scale, analysis, and interpretation. Here we focus on word 
disambiguation as scene selection, based on the Oxford Pictorial English Dictionary. We 
present a results of a spatial-scene identification ability using our original associative 
memory, We show both theoretical and experimental analysi ... 
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Hiroshi Kanayama, Kentaro Torisawa, Yutaka Mitsuishi, Jun'ichi Tsujii 

July 2000 Proceedings of the 18th conference on Computational linguistics - Volume 1 

Full text available: ^ pdf(680.76 KB) Additional Information: full citation , abstract , references , citings 

This paper describes a hybrid parsing method for Japanese which uses both a hand-crafted 
grammar and a statistical technique. The key feature of our system is that in order to 
estimate likelihood for a parse tree, the system uses information taken from alternative 
partial parse trees generated by the grammar. This utilization of alternative trees enables us 
to construct a new statistical model called Triplet/Quadruplet Model. We show that this 
model can capture a certain tendency in Japan ... 

19 Japanese OCR error correction using character shape similarity and statistical 
language model 
Masaaki Nagata 
August 1998 

Full text available: ^|pdf(686.18 KB) Additional Information: full citation , abstract , references 

We present a novel OCR error correction method for languages without word delimiters tf\at 
have a large character set, such as Japanese and Chinese. It consists of a statistical OCR 
model, an approximate word matching method using character shape similarity, and a word 
segmentation algorithm using a statistical language model. By using a statistical OCR model 
and character shape similarity, the proposed error corrector outperforms the previously 
published method. When the baseline character recog ... 
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20 Learning probabilistic subcategorization preference by identifying case dependencies Q 
and optimal noun class generalization level 
Takehito Utsuro, Yuji Matsumoto 

March 1997 Proceedings of the fifth conference on Applied natural language 
processing 
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This paper proposes a novel method of learning probabilistic subcategorization preference. 
In the method, for the purpose of coping with the ambiguities of case dependencies and 
noun class generalization of argument/adjunct nouns, we introduce a data structure which 
represents a tuple of independent partial subcategorization frames. Each collocation of a 
verb and argument/adjunct nouns is assumed to be generated from one of the possible 
tuples of independent partial subcategorization frames. Par ... 
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This paper describes a method for recognizing coherence relations between clauses which 
are linked by te in Japanese-a translational equivalent of English and. We consider that the 
coherence relations are categories each of which has a prototype structure as well as the 
relationships among them. By utilizing this organization of the relations, we can infer an 
appropriate relation from the semantic structures of the clauses between which that relation 
holds. We carried out an experi ... 

22 Use of mutual information based character clusters in dictionary-less morphological 
analysis of Japanese 

Hideki Kashioka, Yasuhiro Kawata, Yumiko Kinjo, Andrew Finch, Ezra W. Black 
August 1998 

Full text available: P| pdf(469.55 KB) 
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For languages whose character set is very large and whose orthography does not require 
spacing between words, such as Japanese, tokenizing and part-of-speech tagging are often 
the difficult parts of any morphological analysis. For practical systems to tackle this problem, 
uncontrolled heuristics are primarily used. The use of information on character sorts, 
however, mitigates this difficulty. This paper presents our method of incorporating character 
clustering based on mutual information into De ... 



Poster papers: Discovering word senses from text 
Patrick Pantel, Dekang Lin 

July 2002 Proceedings of the eighth ACM SIGKDD international conference on 
Knowledge discovery and data mining 

Full text available: ^ pdf(661.99 KB) Additional Information: full citation , abstract , references , index terms 

Inventories of manually compiled dictionaries usually serve as a source for word senses. 
However, they often include many rare senses while missing corpus/domain-specific senses. 
We present a clustering algorithm called CBC (Clustering By Committee) that automatically 
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discovers word senses from text. It initially discovers a set of tight clusters called 
committees that are well scattered in the similarity space. The centroid of the members of a 
committee is used as the feature vector of the clus ... 

Keywords: clustering, evaluation, machine learning, word sense discovery 



24 Poster Sessions: Word extraction from corpora and its part-of-speech estimation using Q 
distributional analysis 

Shinsuke Mori, Makoto Nagao 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
2 

Full text available: |p pdf(360.98 KB) Additional Information: full citation , abstract , references , citings 

Unknown words are inevitable at any step of analysis in natural language processing. We 
propose a method to extract words from a corpus and estimate the probability that each 
word belongs to given parts of speech (POSs), using a distributional analysis. Our 
experiments have shown that this method is effective for inferring the POS of unknown 
words. 

25 Large-scale resources: The automatic creation of lexical entries for a multilingual MT Q 
system 

David Farwell, Louise Guthrie, Yorick Wilks 

August 1992 Proceedings of the 14th conference on Computational linguistics - Volume 
2 

Full text available: gpdf(436.57 KB) Additional Information: full citation , abstract , references , citings 

In this paper, we describe a method of extracting information from an on-line resource for 
the construction of lexical entries for a multi-lingual, interlingual MT system (ULTRA). We 
have been able to automatically generate lexical entries for interlingual concepts 
corresponding to nouns, verbs, adjectives and adverbs. Although several features of these 
entries continue to be supplied manually we have greatly decreased the time required to 
generate each entry and see this as a promising method f ... 

26 Papers: Inherited Feature-based Similarity Measure based on large semantic hierarchy Q 
and large text corpus 

Hideki Hirakawa, Zhonghui Xu, Kenneth Haase 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
1 

Full text available: *g ) pdf(623.25 KB) Additional Information: full citation , abstract , references , citings 

We describe a similarity calculation model called IFSM (Inherited Feature Similarity 
Measure) between objects (words/concepts) based on their common and distinctive 
features. We propose an implementation method for obtaining features based on abstracted 
triples extracted from a large text corpus utilizing taxonomical knowledge. This model 
represents an integration of traditional methods, i.e., relation based similarity measure and 
distribution based similarity measure. An experiment, using our n ... 

27 Estimating understandabilitv of software documents 
Kari Laitinen 

July 1996 ACM SIGSOFT Software Engineering Notes, volume 21 issue 4 
Full text available: *g | pdf(878.94 KB) Additional Information: full citation , abstract , index terms 

Software developers and maintainers need to read and understand source programs and 
other kinds of software documents in their work. Understandability of software documents is 
thus important. This paper introduces a method for estimating the understandability of 
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software documents. The method is based on a language theory according to which every 
software document is considered to contain a language of its own, which is a set of symbols. 
The understandability of documents written according to di ... 
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Lynn Carlson, Sergei Nirenburg 

March 1992 Proceedings of the third conference on Applied natural language 
processing 
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29 New horizons in commercial and industrial Al 
Toshinori Munakata 

November 1995 Communications of the ACM, volume 38 issue n 
Full text available: | || pdf(400.28 KB) Additional Information: full citation , abstract , index terms 

AI as a field has undergone rapid growth in diversification and practicality. For the past 10 
years, the repertoire of AI techniques has evolved and expanded. Scores of newer fields 
have recently been added to the traditional domains of practical AI. Although much practical 
AI is still best characterized as advanced computing rather than intelligence, applications in 
everyday commercial and industrial settings have certainly increased, especially since 1990. 
Additionally, A ... 
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experimental study of two corpora 
Xiaobo Ren, Francois Perrault 
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Kumiko Tanaka, Hideya Iwasaki 

August 1996 Proceedings of the 16th conference on Computational linguistics - Volume 
2 

Full text available: ^ pdf(571.08 KB) Additional Information: full citation , abstract , references , ci tings 

A method for extracting lexical translations from non-aligned corpora is proposed to cope 
with the unavailability of large aligned corpus. The assumption that "translations of two co- 
occurring words in a source language also co-occur in the target language" is adopted and 
represented in the stochastic matrix formulation. The translation matrix provides the co- 
occurring information translated from the source into the target. This translated co-occurring 
information should resemble that of the ori ... 
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Theodora I. Noussia 

September 1997 Computational Linguistics, Volume 23 Issue 3 
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The purpose of this paper is to formally examine hyphenation as it pertains to Modern Greek 
with the aim of achieving accurate and thorough machine hyphenation. Grammar rules are 
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interpreted and formally expressed in terms of regular expressions of word substrings, and 
exact hyphenation rules are derived. Vowel splitting, which traditionally is indicated in terms 
of prohibitive rather than explicit grammar rules, is examined in detail. Many ambiguities 
caused by circular definitions of the prohi ... 

33 CYC, WordNet, and EDR: critiques and responses 
Doug Lenat, George Miller, Toshio Yokoi 

November 1995 Communications of the ACM, Volume 38 Issue 11 

Full text available: ^pdf(106.09 KB) Additional Information: full citation , abstract , citings , index terms 

I applaud Miller's WordNet project and feel that there is much in common in our approaches, 
even though there are fundamental differences in the two expressions of that spirit. Here, I 
list the four differences I noted, closing with a crucial observation concerning the common 
spirit in our work. 

34 Association-based natural language processing with neural networks 
Kimura Kazuhiro, Suzuoka Takashi, Amanb Sin-ya 

June 1992 Proceedings of the 30th annual meeting on Association for Computational 
Linguistics 

Full text available: 'Ppdf(450.05 KB) 
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This paper describes a natural language processing system reinforced by the use of 
association of words and concepts, implemented as a neural network. Combining an 
associative network with a conventional system contributes to semantic disambiguation in 
the process of interpretation. The model is employed within a kana-kanji conversion system 
and the advantages over conventional ones are shown. 

35 Part-of-speech induction from scratch | 
■ Hinrich Schutze 

June 1993 Proceedings of the 31st annual meeting on Association for Computational 
Linguistics 

Full text available: fl ) pdf(717.90 KB) 
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This paper presents a method for inducing the parts of speech of a language and part-of- 
speech labels for individual words from a large text corpus. Vector representations for the 
part-of-speech of a word are formed from entries of its near lexical neighbors. A 
dimensionality reduction creates a space representing the syntactic categories of 
unambiguous words. A neural net trained on these spatial representations classifies 
individual contexts of occurrence of ambiguous words. The method classif ... 

36 Lexicon: Noun phrasal entries in the EDR English word dictionary | 
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August 1994 Proceedings of the 15th conference on Computational linguistics - Volume 
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Akira Maeda, Fatiha Sadat, Masatoshi Yoshikawa, Shunsuke Uemura 

November 2000 Proceedings of the fifth international workshop on on Information 




With the worldwide growth of the Internet, research on Cross-Language Information 
Retrieval (CUR) is being paid much attention. Existing CUR approaches based on query 
translation require parallel corpora or comparable corpora for the disambiguation of 
translated query terms. However, those natural language resources are not readily 
available. In this paper, we propose a disambiguation method for dictionary-based query 
translation that is independent of the availability of such scarce langua ... 

Keywords: WWW, cross-language information retrieval, mutual information, search engine 



38 XRel: a path-based approach to storage and retrieval of XML documents using 
relational databases 

August 2001 ACM Transactions on Internet Technology (TOIT), Volume 1 Issue 1 



using relational databases. In this approach, an XML document is decomposed into nodes on 
the basis of its tree structure and stored in relational tables according to the node type, with 
path information from the root to each node. XRel enables us to store XML documents using 
a fixed relational schema without any information about DTDs and also to utilize indices 
such as the B+ 

Keywords: XML query, XPath, text markup, text tagging 
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Abstract (Basic) : WO 200109879 Al 

NOVELTY - A recognition candidate is modified to substitute a 
proposed word for the word fragment and the adjacent word fragments 
or words used to form the proposed word, when the word fragment can 
be combined with adjacent word fragments to form the proposed word. 
The proposed word is included in the back up dictionary of the speech 
recognition system. 

DETAILED DESCRIPTION - The recognition candidate is discarded when 
the word fragment cannot be combined with the adjacent word 
fragments to form the proposed word included in the back-up 
dictionary of the speech recognition system. The word fragment is 
included in the recognition candidate from a speech recognizer (215) . 
The speech recognizer performs speech recognition on a user utterance 
to produce one or more recognition candidates. INDEPENDENT CLAIMS are 
also included for the following: 

(a) a method of recognizing speech; 

(b) a method of generating an acoustic model of a word fragment; 

(c) a computer implemented speech recognition system; 

(d) and a computer software. 

USE - Used for expanding effective active vocabulary of a speech 
recognition system . 

ADVANTAGE - Enables increasing the effective size of the active 
vocabulary of the speech recognition system by using fragmented word 
models. Enables improving the ability of the recognizer to recognize 
less-f requently used words, since the size of the active vocabulary is 
increased . 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram of 
the speech recognition software of the speech recognition system. 
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Reference pattern training system giving recognition unit smaller than 
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recognition unit, selects unit, reassembles extracted unit in preceeding 
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Abstract (Basic) : EP 731447 A 

The training system takes out -and compares phoneme data about 
recognition units stored in a recognition word dictionary (10) and a 
training word dictionary (11) . A recognition unit is selected from 
the recognition training words stored in the training word dictionary 
. The selected recognition unit reassembles each selected recognition 
unit in a subject recognition word stored in the. recognition word 
dictionary in phoneme context . 

The selected recognition unit is sent to a recognition reference 
pattern generator (14) , which utilises a recognition unit which is best 
in accord in phoneme context among training data stored in a 
trainig data memory (12) concerning a recognition unit selected by the 
training unit selector (13) . 

ADVANTAGE - Can expand scope of phoneme context while reducing 
amount of stored data and provides high recognition performance for 
speech recognition system. 
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ABSTRACT 



PROBLEM TO BE SOLVED: To provide a voice interactive system with which a 
voice having the non-clear partition of speaking can be comprehended in 
real time and immediate responding is enabled by speaking at free timing. 

SOLUTION: When any word is inputted, while referring to a dictionary 250, 
the semantic description of that word is found by an interactive context 
managing part 210, a new interactive context candidate is provided 
together with respective interactive context candidates provided up to that 
time point, a rule application part 220 obtains a further new interactive 
context candidate by applying language comprehension rules 110 to these 
interactive context candidates, a priority calculating part 230 calculates 
priority by adding these new interactive context candidates to the 
source interactive context candidates, and the interactive context managing 
part 210 outputs the interactive context candidate of the highest priority. 
This processing is repeated each time a word is inputted. 
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ABSTRACT 



PROBLEM TO BE SOLVED: To improve the selection accuracy of translated words 
in a machine translation mode without lowering the processing efficiency by 
using plural types of dictionaries including a context* dictionary 
when a Vord that is not defined in a compound word dictionary is 

translated in a sentence. 

SOLUTION: Every input sentence is taken out at its head (110) , and the 
compound words corresponding to a word string composing a single input 
sentence are retrieved from a compound word dictionary . When each of 
wards which are not corresponding to the compound words is translated 
(120) , its translation is decided by a context dictionary and the 



translated word is obtained based on the translation (130) . The 
translated word undergoes the translation result registering processing. 
Then it's checked whether an object word is stored in a translation result 
recording buffer as a header. If the object word is stored in the buffer, 
it's checked whether or not its translated word is stored in the buffer and 
all words which are not corresponding to the compound words are processed 
(140, 160). Then all words are translated (170). When it's decided that a 
full sentence is translated, the sentence is registered (180 to 195) after 
undergoing the retranslation effect evaluation processing. 
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ABSTRACT 

PROBLEM TO BE SOLVED: To attain high performance ambiguity cancelation be 
deleting a group of an unreliable evidence and word meaning supported 
by the evidence from a determination list and canceling semantic ambiguity 
based only on a reliable evidence. 

SOLUTION: A determination list learning part 2 calculates logarithmic 
likelihood ratio between conditional probability values in the appearance 
of respective word meanings by using information appearing in contexts 

around a word in question in an input text as evidences and sets up 
the groups of evidences arranged in the descending order and word meanings 
supported by the evidences as a determination list. Then mutual information 
quantity between each evidence and the word in question is calculated and a 
group of an evidence of which mutual information volume does not exceed a 
threshold and a word meaning supported by the evidence is deleted from 
the list and stored in a determination list storing part 3. A semantic 
ambiguity canceling part 5 successively checks whether each evidence 
described in the list appears in contexts around the word in question 
in the input text or not and outputs a word meaning supported by the 
evidence described in the list as the , meaning of the word is question. 
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ABSTRACT 

PROBLEM TO BE SOLVED: To enable also a user inexperienced in language to 
input a suitable word during the preparation of a document. 

SOLUTION: Characters inputted through an input part 11 are successively- 
stored in an input character storingO area 13b, a word most close to 
a character string consistinag of respective characters stored in the area 
13b is retrieved from a word dictionary stored in a dictionary 

storing area as a neighboring word in dach input of a character and 
the retrieved word is sub-displayed on a sub-display area 14b of a display 
part 14. After the end of characters equivalent to one word,, a word related 
to the word concerned is retrieved from the word dictionary as a relative 
word and sub-displayed on the sub-display area 14b of the display part 14. 
Thereby the contents of the sub-display can be utilized as the help of the 
character input on the way of character display, and even after the end of 
a character input for one word, the contents of the sub- display can be 
utilized as the help of the character input. 
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ABSTRACT 

PURPOSE: To speedily know a proper equivalent with small labor by 
displaying a proper equivalent of the context to a specified word in 
easy- to- see form for a user at the top or upper part of a synonym display 
part or by coloring, etc. 

CONSTITUTION: When the user specifies a document to be read through an 
input means 1, a computation program execution means 3 displays the 
necessary document in a document display area 21 in a display means 2 by 
using a document display routine 41 and the user beings to read it. If a 
word whose meaning is unknown is found, the word on the screen is 
specified with a pen input means 13. The computation program execution 
means 3 actuates a synonym display routine 42 in response to the 
specification of the word to open a synonym display area 22 on the screen 
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nearby the word , and the meaning of the word stored in a 
dictionary 51 is displayed on the scr.een. At this time, when the word 
has plural meanings, the meaning matching the context of the passage 

is preferentially displayed at an upper part in consideration of the 

context . 
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ABSTRACT 

PURPOSE: To clearly display the correspondence of a translated word of 
second language corresponding to a word (including an idiom) of first 
language, in the bilingual display for displaying the translated word under 
the word of a sentence of first language. 

CONSTITUTION: A bilingual word dictionary' 1 gives a translated word of 
second language to a word of first language. A translated word acquiring 
means 2 retrieves the bilingual word dictionary based on a 
character-string in a document of first language, and obtains a translated 
word of second language to the word of first language. A bilingual display 
means 3 adds and displays the translated word between lines in the 

vicinity of its position at every word whose translated word is 
acquired, together with the document of first language. With such a 
constitution, a range of the word to which the translated word is 

added is discriminated and displayed. 
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ABSTRACT 

PURPOSE: To appropriately translate a word by translating the word from 
the context of the designated word , and checking the part of a text. 

CONSTITUTION: When there are an unknown word during a text input (edition), 
and the pertinent unknown word is desired to be translated, the word is 
indicated by an inputting device 10, and a dictionary consultation 
processor 8 is activated. Then, an analysis is executed for the range of 
the text including the word , and the appropriate meaning of the 
pertinent word is displayed. And also, a translated word is displayed 
after or under the designated word. 
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ABSTRACT 

PURPOSE: To speedily understand a document by consulting with a dictionary 
immediately when a word whose meaning can not be understood is 

encountered at the time of understanding the document, and displaying 
dictionary data nearby the word and storing the data relatively. 

CONSTITUTION: The document in object language is inputted through an input 
device 10 and the inputted document and the processing result of a 
processor 1 are displayed on a display device 6 . The words in the document 
displayed on the display device 6 are indicated by a word indication device 
and looked up in the dictionary by a dictionary retrieval device 8 . 

Dictionary information on the retrieved words is displayed on the display 
device 6 and also displayed nearby the words by a display controller 5. 
The words and desired information are related and stored in a memory 2 . 
Consequently, the document can speedily be understood. 
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ABSTRACT 

PURPOSE: To obtain a powerful polysemy eliminating function at an early 
stage of analysis by using the limit conditions of adjacency, the stage 
information, and the priority information which are stored in a dictionary 

CONSTITUTION: A set between the conditions and the intensity of the 
attributes of the meaning of words which are difficult to be placed 
adjacent to each other in a sentence and a set of the conditions and the 
intensity of the attributes of the meaning of words adjacent to each 
other in a sentence are registered into a dictionary 102. At the same 
time, the priority information is also registered in the dictionary 102 
together with the stage information which decides a stage where the 
polysemy can be eliminated. An input sentence dividing part 101 divides an 
input sentence into sections of adjacent words by reference to the 
dictionary 102. An adjacent condition application part 103 deletes the 
sets satisfying the conditions of the meaning of words which are 
difficult to be placed adjacent to each other to plural candidate words of 
the adjacent word sections and the sets which do not satisfy the 

conditions of .the words adjacent to each other in the order of stronger 
conditions. Then a priority application part 104 selects the meaning of 
words having high priority to the polysemy that cannot be eliminated at 
the subsequent stages based on the stage information. As a result, it is 
possible to obtain a powerful polysemy eliminating function in an early 
stage of analysis. 
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ABSTRACT 

PURPOSE: To presume the logical structure and sentence structure of a 
composition along the intention of the writer of the composition before 
making the syntax analysis and semantic analysis, by providing a context 
form holding mechanism, context structure collating mechanism, and context 



structure presuming mechanism. 

CONSTITUTION: An inputted composition read by an input composition reading 
section 1 is transmitted to a form element analyzing section 2 where the 
composition is divided into words on the basis of a dictionary 3, and a 
content of the dictionary 3 are assigned to each word and transmitted 
to a context analyzing section 4 . The context analyzing section 4 is 
composed of a context form holding mechanism 5 ; context structure collating 
mechanism 6, and context structure presuming mechanism 7 and the context 
structure collating mechanism 6 detects candidates of the context 
structure of the inputted composition by referring to the information of 
vocabularies indicating a context stored in the context form holding 
mechanism 5 . The context structure presuming mechanism 7 presumes the most 
probable context structure out of the received context structure candidates 
and a syntax analyzing section 8 carries out a syntax analysis in 
accordance with the presumed context structure. A semantic analyzing 
section 9 extracts the meaning of the inputted composition on the basis of 
the structure of the inputted composition obtained as a result of the 
syntax analysis. 
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Handheld electronic book reader annotating method for use in educational 
environment, involves creating context for defined term in 
electronic document by associating selected portion of electronic 
document with defined term 

Patent Assignee: INT BUSINESS MACHINES CORP (IBMC ) 

Inventor: BARSNESS E L; SANTOSUOSSO J M 

Number of Countries: 001 Number of Patents: 001 

Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 20040201633 Al 20041014 US 2001951363 A 20010913 200474 B 

Priority Applications (No Type Date) : US 2001951363 A 20010913 
Patent Details: 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 20040201633 Al 14 G09G-005/00 

Abstract (Basic) : US 20040201633 Al 

NOVELTY - The method involves selecting a portion of an electronic 
document displayed in a handheld electronic book reader in response to 
user input . A context is created for a defined term in the 
electronic document by associating the selected portion of the 
electronic document with the defined term . The definition of the 
defined term and the context associated with the term is 
displayed in response to user input on a display (14) . 

DETAILED DESCRIPTION - INDEPENDENT CLAIMS are also included for the 
following: 

(A) a method of displaying an electronic document on a handheld 
electronic book reader 

(B) a method of monitoring usage of an electronic document. 

USE - Used for annotating with a handheld electronic book reader 
that is utilized in educational or classroom environment by student and 
instructor, and for general entertainment or informational purpose. 

ADVANTAGE - The method enhances the collaborative and annotative 
capabilities of electronic book readers and hence provide useful 
feedback on reader using the electronic documents e.g. to improve 
lesson plans associated with such documents, to monitor student 
activity, and to identify students needing supplemental assistance. The 



method provides significant benefits in terms of improving the 
educational process for both students and instructors. 

DESCRIPTION OF DRAWING ( S ) - The drawing shows a top plan view of a 
handheld electronic book reader incorporating annotation and usage 
tracking capabilities using the handheld electronic book reader 
annotating method. 

Reader (12) 

Touch screen display (14) 
Power button (15) 
Menu button (16) 
Electronic book reader (20) 
pp; 14 DwgNo 1/7 
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Term context learning method in terminological system, involves 
selecting focal point category that identifies concept for category 
relationship for term to learn from base term, based on assigned weight 
values 

Patent Assignee: ORACK CORP (ORAC-N) 
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Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 6415283 Bl 20020702 US 98170895 A 19981013 200274 B 

Priority Applications (No Type Date) : US 98170895 A 19981013 
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US 6415283 Bl 20 G06F-017/30 

Abstract (Basic) : US 6415283 Bl 

NOVELTY - The selected category nodes representing concepts for 
category relationships, are assigned with weight values associated with 
corresponding base terms. A cluster of categories of nodes is selected 
based on assigned values and category relationships. A focal point 
category identifying a concept of a term to learn from a base term 
based on assigned values, is selected for the cluster. 

DETAILED DESCRIPTION - An INDEPENDENT CLAIM is included for 
computer readable medium storing term context learning program. 

USE - Used to learn the context or meaning of terms by 
identifying categories from a knowledge catalog in terminological 
learning system. Used to learn the context of the terms , also for 
use in control theory, medical diagnostics, criminal profile and fraud 
detection. 

ADVANTAGE - Terms are mapped to categories of a classification 
system and the clustering techniques are used to identify categories in 
the system that best reflect the terms input to the terminological 
system . 

DESCRIPTION OF DRAWING (S) - The figure shows the block diagram 
illustrating the cluster processing, 
pp; 2 0 DwgNo 1/9 
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Complicated context correlation processing technique 
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Abstract (Basic) : CN 1180203 A 

NOVELTY - The present invention includes the following technical 
steps: 1. embedding context -dependent information and 
context-dependent operation in rule and dictionary , the form of rule 
is header- -context-dependent function, right portion and conversion 
body; the form of every word in the dictionary is: entry word , 
characteristic set , context -dependent function and version; 2. for 
every rule, making header matching first, if the matching is 
successful, executing context-dependent function of rule to determine 
context-dependent condition of current header pattern, if said 
condition is tenable, inducing the content in the current pattern; and 
3. in a similar way to rule processing mode, making context -dependent 
processing of words . By adopting data and operation integrated 
technology, it effectively solves the complex context-dependent 
processing problem. 
DwgNo 0/0 
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Keyword extracting method for use in document search processing - 
involves semantically classifying corresponding keyword extracted, 
based on context of speech and meaning of specific word in input 
sentence 

Patent Assignee: NIPPON TELEGRAPH & TELEPHONE CORP (NITE ) 



Number of Countries: 001 Number of Patents: 001 
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Patent No Kind Date Applicat No Kind Date Week 
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Priority Applications (No Type Date) : JP 9853889 A 19980305 
Patent Details: 
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Abstract (Basic) : JP 11250097 A 

NOVELTY - Importance degree of a specific word in an input 
sentence is judged. Based on the context of the speech and the 
meaning of the word , a corresponding keyword is extracted and 
classified, semantically . Based on the importance degree of the 
classified keyword, the keyword of a preset number is output. DETAILED 
DESCRIPTION - An INDEPENDENT CLAIM is also included for keyword 
extracting apparatus. 

USE - For use in document search ^processing . 

ADVANTAGE - Since importance degree of keyword is decided, by 
frequency of word, search accuracy is raised. 
Dwg. 1/12 
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Input information based context processing apparatus for language 
processor, machine translator, word processor - has matching relation 
judging unit which detects whether matching data satisfy predefined 
conditions and stores them, in context information memory unit 
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Abstract (Basic) : JP 11250060 A 

NOVELTY - A search unit (5) searches for matching data in the 
dictionary (4) and stores it in memory (6) . An updating unit (7) 
updates the stored context data based on an indication from the 
controller. A matching relation judging- unit detects whether the 
matching data satisfies specific conditions, based on which it is 
stored in context information memory (9) . DETAILED DESCRIPTION - A 
sentence is input and stored in a memory (2) . A controller (3) outputs 
context information indicating sequentially read words of input 
sentence, based on which a dictionary (4) stores matching data. The 
matching data has information about relationship between the group of 
words that appear together in sentence. 

USE - In language processor, machine translator, word 



processor , interactive processor, etc. 

ADVANTAGE - Accuracy of context information is improved and exact 
information is output. DESCRIPTION OF DRAWING (S) - The figure shows 
block diagram of context processing apparatus. (2) Memory; (3) 
Controller; (4) Dictionary ; (5) Search unit; (6) Memory; (7) Updating 
unit; (9) Context information memory. 
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Context processing apparatus in machine translation apparatus, 
interactive processing apparatus, word processor - includes updating 
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Priority Applications (No Type Date) : JP 9820572 A 19980202 
Patent Details: 
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JP 11219360 A 9 G06F-017/27 

Abstract (Basic) : JP 11219360 A 

NOVELTY - The updating processing unit (7) updates context 
information stored in the memory (9) only when the words in the 
context information memory corresponds to coincidence data stored in 
memory (6) after reading all the words in input sentence. DETAILED 
DESCRIPTION - The memory (2) stores input sentence. The dictionary 
(4) stores coincidence data. The searching unit searches coincidence 
data from the coincidence dictionary corresponding to search key. The 
memory (6) stores the searched coincidence data. The updating 
processing unit updates the context information. The judging unit (8) 
judges if the group of arbitrary words is stored in the coincidence 
dictionary as coincidence data. The context information memory 
stores context information along with corresponding coincidence 
data . 

USE - For context processing in machine translation apparatus, 
interactive processing apparatus, word processor. 

ADVANTAGE - Facilitates to process context information with high 
efficiency by confirming predefined relationship between coincidence 
data memory and context data memory. DESCRIPTION OF DRAWING (S) - The 
figure depicts functional block diagram of context processing 
apparatus. (2,6,9) Memories; (4) Dictionary ; (7) Updating processing 
unit;- (8) Judging unit. 
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Title: Efficient dictionary access method for morphological analysis 

Author: Ando, Kazuaki; Tsuji, Takako; Fuketa, Masao; Aoe, Jun-ichi 
Corporate Source: Univ of Tokushima, Tokushima-Shi , Jpn 

Conference Title: Proceedings of the 1998 IEEE International Conference 
on Systems, Man, and Cybernetics. Part 3 (of 5) 

Conference Location: San Diego, CA, USA Conference Date: 
19981011-19981014 

Sponsor: IEEE 

E.I. Conference No.: 49610 

Source: Proceedings of the IEEE International Conference on Systems, Man 
and Cybernetics 3 1998. IEEE , Piscataway, NJ, USA, 98CB36218 . p 2876-2881 
Publication Year: 1998 
CODEN: PICYE3 ' ISSN: 1062-922X 
Language: English 

Document Type: CA; (Conference Article) Treatment: G; (General Review) 
Journal Announcement: 9903W5 

Abstract: This paper proposes an efficient dictionary access method for 
morphological analysis of oriental languages by extending an Aho and 
Corasick's pattern matching machine. The proposed method is a simple and 
efficient algorithm to find all possible substrings in an input sentence 
and during a single pass. It stores the relations of grammatical 
connectivity of adjacent words into the output functions. Therefore, 
the costs of checking connections between the adjacent words can be 
reduced by using the connectivity relations. Furthermore, the construction 
method of the relations of grammatical connectivity is described. Finally, 
the proposed method is verified by theoretical analysis and an experimental 
estimation is supported by the computer simulation with a 100,000 words 
dictionary . From the simulation results, it turns out that the proposed 
method was' 49.9% faster (CPU time) than the traditional trie approach. As 
for the number of candidates for checking connections, it was 25.5% less 
than that of the original morphological analysis. (Author abstract) 11 
Refs. 

Descriptors: ^Natural language processing systems; Pattern matching; 
Pattern recognition systems; Algorithms; Functions; Computer simulation; 
Mathematical morphology 

Identifiers: Dictionary access method 
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Language: English Document Type: Conference Paper (PA) 
Treatment: Theoretical (T) 

Abstract: The notion of context is indispensable in discussions of 
meaning , but the word context has often been used in conflicting 
senses. In logic, the first representation of context as a formal object 
was by the philosopher C.S. Peirce; but for nearly eighty years, his 
treatment was unknown outside a small group of Peirce aficionados. In the 
early 1980s, three new theories included related notions of context : 
Kamp^s (1981) discourse representation theory; Barwise and Perry^s (1983) 
situation semantics; and Sowars (1984) conceptual graphs, which explicitly 
introduced Peirce ^s approach to the AI community. More recently, John 
McCarthy and his students have began to use a closely related notion of 
context as a basis for organizing and partitioning knowledge bases. Each of 
the theories has distinctive, but complementary ideas that can enrich the 
others, but the relationships between them are far from clear. This paper 
analyzes the semantic foundations of these theories and shows how- 
McCarthy's ist(c, p) predicate can be interpreted in terms of the semantic 
notions underlying the others. (24 Refs) 

Subfile: C 

Descriptors: artificial intelligence; computational linguistics; formal 
logic 

Identifiers: syntax; semantics,- contexts; formal logic; meaning; AI ; 
discourse representation; situation semantics; conceptual graphs,- 
artificial intelligence; knowledge bases; semantic foundations,- semantic 
notions 
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Abstract: Describes the origins, nature and development of the COFREL 
(Computerized Old French-English Lexicon) project, whose purpose is the 
production of an Old French-English dictionary . In part, it is the story 
of how a team of near -illiterates (in computing terms ) conceived of a 
project in ignorance of both the true potentials of information technology 
in lexicology and lexicography, and of the dangers of that very ignorance. 
The paper summarizes the software and hardware employed as the project has 
progressed, as well as some of the computing techniques used in the 
collection, storage and preparation of data. It aims to illustrate how, in 
a long-term project, time and effort can sometimes be misdirected, and how 
a research strategy should remain flexible enough to take into account 
technological progress and increasing technical competence on the part of 
the researchers. (0 Refs) 
Subfile: C 
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Identifiers: COFREL project; French-English dictionary ; Computerized 
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Abstract: The paper addresses the problem of how to identify the intended 
meaning of individual words in unrestricted texts, without necessarily 
having access to complete representations of sentences. To discriminate 
senses, an understander can consider a diversity of information, including 
syntactic tags, word frequencies, collocations, semantic context , 
role-related expectations, and syntactic restrictions. However, current 
approaches make use of only small subsets of this information. The .author 
describes how to use the whole range of information. The discussion 
includes how the preference cues relate to general lexical and conceptual 
knowledge and to more specialized knowledge of collocations and contexts. 
She describes a method of combining cues on the basis of their individual 
specificity, rather than a fixed ranking among cue-types. She also 
discusses an application of the approach in a system that computes sense 
tags for arbitrary texts, even when it is unable to determine a single 
syntactic or semantic representation for some sentences. (45 Refs) 
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Class Codes: C6180N (Natural language processing); C6170 (Expert systems 

) 
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DIALOG (R) File 2 : INSPEC 

(c) 2005 Institution of Electrical Engineers. All rts. reserv. 

00987162 INSPEC Abstract Number: C76030546 
Title: The data dictionary /directory (DD/D) 

Author (s) : Delport, L. 

Journal: Informatie vol.18, no. 7-8 p. 446-55 

Publication Date: July-Aug. 1976 Country of Publication: Netherlands 

CODEN: INFTCR ISSN: 0019-9907 

Language: Dutch Document Type: Journal Paper (JP) 
Treatment: Practical (P) 

Abstract: Data dictionaries /directories (DD/D) are defined and 
described. The main features include the complication of references with 
unambiguous listings and facilities for cross- indexing . Potential users are 
listed along with the associated activities. The structure of a typical 
DD/D is a hierarchical system with segments for catalogue components. The 
KWIC index (key- word -in- context ) is applicable and COPY routines are 
available. About six examples are given of commercially available data 
dictionaries along the lines discussed. A useful feature is the inclusion 
of a glossary to elaborate special terms used in the text, e.g. 
HIDAM=hierarchical indexed direct access method. 



File 275:Gale Group Computer DB(TM) 1983 -2005/Jul 07 

(c) 2005 The Gale Group 
File 621:Gale Group New Prod . Annou . (R) 1985-2005/Jul 08 

(c) 2005 The Gale Group 
File 636:Gale Group Newsletter DB(TM) 1987-2005/Jul 07 

(c) 2005 The Gale Group 
File 16:Gale Group PROMT (R) 1990-2005/Jul 07 

(c) 2005 The Gale Group 
File 160:Gale Group PROMT (R) 1972-1989 

(c) 1999 The Gale Group 
File 148:Gale Group Trade & Industry DB 1976 - 2005/ Jul 08 

(c)2005 The Gale Group 
File 624 :McGraw-Hill Publications 1985-2005/Jul 08 

(c) 2005 McGraw-Hill Co. Inc 
File 15 :ABI/lnform(R) 1971 - 2005/ Jul 08 

(c) 2005 ProQuest Inf o&Learning 
File 647: CMP Computer Fulltext 1988 -2005/ Jun W3 

(c) 2005 CMP Media, LLC 
File 674: Computer News Fulltext 198 9 -2005/ Jul Wl 

(c) 2 005 IDG Communications 
File 696:DIALOG Telecom. Newsletters 1995-2005/ Jun 20 

(c) 2 005 The Dialog Corp. 
File 369:New Scientist 1994-2005/May W2 

(c) 2005 Reed Business Information Ltd. 
File 810:Business Wire 1986 - 1999/Feb 28 

(c) 1999 Business Wire 
File 813 :PR Newswire 1987-1999/Apr 30 

(c) 1999 PR Newswire Association Inc 
File 610:Business Wire 1999-2005/ Jul 08 

(c) 2005 Business Wire. 
File 613 : PR Newswire 1999-2005/ Jul 08 

(c) 2005 PR Newswire Association Inc 



Set 


Items 


Description 




SI 


114516 


DICTIONAR? ? ? OR GLOSSAR??? 




S2 


8423908 


WORD? ? OR KEYWORD? ? OR TERM? ? 




S3 


226206 


S2(5N) (DEFIN??? OR DEFINITION? ? OR 


MEANING) 


S4 


18051 


S2 (5N) CONTEXT??? 




S5 


320063 


(ADJACENT OR NEXT OR NEAR OR NEARBY 


OR CLOSE OR AROUND OR 



SAME () SENTENCE OR PROXIMAL? OR PROXIMITY OR (WITHIN OR IN) OR- 
ANGE OR VICINIT??? OR ADJOIN??? OR NEIGHBOR??? OR BESIDE? ? OR 
SURROUND???) (5W)S2 

S6 50893 (CONTEXT OR S5) (7N) (STOR??? OR SAV??? OR ADD??? OR INSERT? - 

?? OR INCORPORAT??? OR INCLUD??? OR SUBMIT???? OR ENTER??? OR 
COPY??? OR COPIE? ?) 



S7 


513 


{SI OR S3) (SON) S4 :S5 (SON) S6 


S8 


10447 


S1(5N) (BUILD??? OR BUILT OR CONSTRUCT??? OR CREAT??? OR 




NERAT??? OR PRODUC??? OR ESTABLISH?) 


S9 


2542 


(PERSONAL OR CUSTOM? OR ALTERNATE OR ALTERNATIVE) (3W)S1 


S10 


28 


(SI OR S3) (50N)S4:S5(50N)S6(50N)S8:S9 


Sll 


22 


RD (unique items ) 


S12 


1106 


USER(2W) SI 


S13 


12 


(SI OR S3) (50N)S4:S5(50N)S6(50N)S12 


S14 


6 


RD (unique items) 


S15 


102 


S6(7N)S1 


S16 


52 


S15 (SON) S4 :S5 


S17 


38 


RD (unique items) 


S18 


23 


S17 NOT (Sll OR S14 OR PY=2002 : 2005 ) 
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02429471 SUPPLIER NUMBER: 63975291 (USE FORMAT 7 OR 9 FOR FULL TEXT) 

A CD-ROM for Word Lover s .( Random House Webster's Unabridged 
Dictionary) (Software Review) (Evaluation) 

Pack, Thomas 
Link-Up, 17, 4, 36 
July, 2000 

DOCUMENT TYPE: Evaluation ISSN: 0739-988X LANGUAGE: English 

RECORD TYPE: Fulltext 

WORD COUNT: 1343 LINE COUNT: 00102 

... up the meaning of logophile and 314,999 other words. It's Random 

House Webster's Unabridged Dictionary on CD-ROM, which includes content 
from the second edition of the printed Random House Webster's Unabridged 
Dictionary , first published in 1987. The electronic version, published 
last September, has been updated with hundreds of new. . . 

. . . che biographic and geographic entries- -have been updated. 

Random House Webster's is the only unabridged American dictionary 
on CD-ROM, according to the publisher, and it includes several features 
besides definitions: 

* You can listen. . . 

...more than 120,000 audio pronunciations, including variant pronunciations 
of the same word. 

* Over 2,400 entries include a link to a map or illustration. 

* Besides searching for the main entry words , you can search for 
words within the definitions . 

* Entries can be bookmarked for quick retrieval . 

* You can print entries or copy and paste them into another document. 

* Access to the dictionary can be integrated within WordPerfect, 
Microsoft Word, and other applications . 

* You can create your own user dictionary . 

Another nice feature- -especially if English isn't your native 
language- -is that the text in the... 

...and dialog boxes can be displayed in English, French, or Spanish. 
Overall, Random House Webster's Unabridged Dictionary on CD-ROM is an 
excellent resource for anyone who needs to consult a dictionary often. 
Search options 

The main interface is easy to use and somewhat intuitive. When you 
start typing. . . 

11/3, K/2 (Item 2 from file: 275) 
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Computer, take a memo, (speech recognition and dictation software for PCs) 
(PC Tech Tutor) (Technology Information) (Column) 

Randall, Neil 

PC Magazine, vl7, nl, p235(2) 
Jan 6, 1998 

DOCUMENT TYPE: Column ISSN: 0888-8507 LANGUAGE: English 

RECORD TYPE: Fulltext; Abstract 

WORD COUNT: 22 94 LINE COUNT: 00186 

... to do is to train it- -that is, provide it with audio samples that 

correspond to the built -in dictionary and acoustic models. During a 
typical training routine, the software presents prose passages on the 
screen, and. . . 



...As you finish each passage, the software stores your voice's acoustic 
patterns in relation to the words of the passage in context . When the 
training is complete the software has a reasonably detailed acoustic model 
of your voice; when you dictate original documents, it draws on that 
acoustic model by comparing your speech with the words or phrases in a 
context similar to that recorded during the training. 
Note that if more than one person will be using. . . 
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An AI -based approach to machine translation in Indian languages, 
(artificial intelligence) 

Raman, S.; Alwar, N. 

Communications of the ACM, v33, n5 , p521(7) 
May, 1990 

ISSN: 0001-0782 LANGUAGE: ENGLISH RECORD TYPE: FULLTEXT; ABSTRACT 

WORD COUNT: 5135 LINE COUNT: 00433 

... As in this example, a search for a match with a query word is 

performed. The query dictionary , which contains all the query-related 
words, is used for the search, and this returns a success... 

...as In. The next item in the prediction list is the agent. The search in 
the pronouns dictionary identifies the agent. This builds the semantic 
representation gradually and ends up with a phrase level representation as 
Figure 3 shows . 
The . . . 

...slots have a number of predefined sub-slots. 

The analyzer then identifies "Delhi" as the destination station. 
Next , when it encounters the word 'express: 1 it creates a prediction 
list and adds the train name to it. Then, using the stations dictionary 
, it checks whether the previous word is a valid name of a train. If it 
encounters a . . . 

11/3, K/4 (Item 4 from file: 275) 

DIALOG (R) File 275 -.Gale Group Computer DB(TM) 
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01352501 SUPPLIER NUMBER: 0818349'8 (USE FORMAT 7 OR 9 FOR FULL TEXT) 

Spellbound? (SpellCode programming language spell checker) (Product 
announcement ) 

Kodama, David 

Data Based Advisor, v8 , nl, pl31(l) 
Jan, 1990 

DOCUMENT TYPE: product announcement ISSN: 0740-5200 LANGUAGE: 

ENGLISH RECORD TYPE: FULLTEXT 

WORD COUNT: 215 LINE COUNT : 00018 

. . . text displayed to users, or it can check the entire program file. 

It comes with an English dictionary and a dictionary of common computer 
terms. SpellCode understands dBASE III PLUS, dBASE IV, FoxBASE+, FoxPro, 
Clipper (and most Clipper libraries) , dBXL, Quicksilver, C, Pascal, BASIC, 
R : BASE , Paradox, and DOS file keywords. 

You can build personal dictionaries for specific industries. 
SpellCode can check the contents of character and memo fields and 
understands DBF structures... 



.also check Lotus and Symphony worksheets. SpellCode 's interface has a 



dual window that can display the context in which the words appear, 
suggested corrections, " fill-in- the-blanks" forms, pop-up selection windows 
for setting program options, and context -sensitive help screens. 

SpellCode costs $99.95 including dictionaries , support utilities, 
and the manual. It requires DOS 2 . 0 or later and 256K RAM; a hard. .. 
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01293495 SUPPLIER NUMBER: 07172278 (USE FORMAT 7 OR 9 FOR FULL TEXT) 

Thunder II en route to the shelves: spelling checker is modular, fast. 
(Electronic Arts Inc.) (product announcement) 

Norr, Henry 

MacWEEK, v3, nl4, pl4(l) 
April 4, 1989 

DOCUMENT TYPE: product announcement ISSN: 0892-8118 LANGUAGE: 

ENGLISH RECORD TYPE: FULLTEXT; ABSTRACT 

WORD COUNT: 3 71 LINE COUNT: 0003 0 

. . . spelling checker you can buy for the Macitnosh, " checks 

approximately 100 words per second against a main dictionary of 86,000 
words plus one or more user dictionaries . It installs itself 
automatically whenever an application or desk accessory specified in the 
Thunder II Control Panel. . . 

...a user- specif ied beep sound any time an unrecognized word is typed. 

* Selection checking of text already entered , with the suspect 
word shown in context along with the alternatives Thunder suggests. 

.Thunder II detects double words and some capitalization and 
punctuation mistakes as well as basic spelling. It also ha an 
abbreviation- expanding glossary and a batch search/replace function that 
can make multiple changes to a document in one operation... 

...able to bomb it, and that's what my job is." 

"It means I need only to create one set of personal dictionaries 
rather than several," said John Kendrick, a profesor of sociology at 
Bucknell University in Lewisburg, Pa. "And... 
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Word Takes Another Forward Stride. 

Burns, D. / Venit, S . 

PC Magazine, v4 , nl3, pl51-153 

June 25, 1985 

DOCUMENT TYPE: evaluaton ISSN: 0888-8507 LANGUAGE: ENGLISH 

RECORD TYPE: FULLTEXT; ABSTRACT 

WORD COUNT: 2228 LINE COUNT: 00169 

. . . Spell has always been an excellent program, but this new release 

includes two major advances. The Standard dictionary now contains over 
80,000 words, and Spell now runs directly from the Word menu. 

The original Spell used a dictionary licensed from Oasis Systems, 
the developer of world Plus, but the new version's dictionary was 
created especially for Microsoft. It includes many more prefixes, 
suffixes, derived words, and proper names such as states... 

'. . . f irt saves the file you've been working on, then reads the file, and 
Chen reads the dictionary to check the spelling. When this process is 
completed, Spell reports back to you how many words. . . 



. . .word or similar words. You can also choose to ignore the word and go on 
to the next one or to add the word to either the program's standard 
dictionary or your own dictionary. 

Spell has five additional programs that can be run either directly 
from within. . . 
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All in the Family: The Perfect Components. 

Raskin, R . ; Christian, K. 

PC Magazine, v4 , nlO, p223-224 

May 14, 1985 

DOCUMENT TYPE: evaluaton ISSN: 0888-8507 LANGUAGE: ENGLISH 

RECORD TYPE: FULLTEXT; ABSTRACT 

WORD COUNT: 4865 LINE COUNT: 00376 

. . . your way through the list, you can instruct PS to ignore a word, 

add it to the dictionary , or mark it in the text. Then, when Perfect 
Writer is started, it works through the misspelled words in context . 

When you add words to the dictionary supplied with Perfect 
Speller, you increase the chance that the program will miss an incorrect 
word. Therefore, rather than adding a few hundred words to the Speller 
dictionary , it is better to create your own dictionaries for 
specialized topics. However, PS searches only the dictionary each time it 
is invoked. It would be perferable if it could use the main dictionary in 
conjunction with a customized dictionary . Dual dictionaries make it 
easier to tell a s pelling checker about technical terms without 
adulterating the main dictionary . 

Perfect Writer's literature boasts that the Speller uses a 
50,000-word dictionary . Perhaps they have invented sub-bit storage 
methods. Their dictionary file is 22,000 bytes long. That works out to 
about 4 bits for each word. Evenwith the best compression techniques, a 
50,000-word dictionary should require over 100,000 bytes of storage. 

Perfect Thesaurus 

An interesting extra in the Writer package,.. 
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• Correct Star . ■ 

Kennedy, D. 

PC Week, vl, n30, p55-56 
July 31, 1984 

DOCUMENT TYPE: evaluaton ISSN: 0740-1604 LANGUAGE: ENGLISH 

RECORD TYPE : FULLTEXT ; ABSTRACT 

WORD. COUNT: 1153 LINE COUNT: 00089 

. . . automatic right justification, soft hyphenation and some 

"housekeeping" matters, such as designating the drives where the various 
dictionaries and files reside. 

When the spelling checker is started, CorrectStar loads an internal 
dictionary into RAM . It... 

...the main dictionary, but which a user expects will appear frequently in 
his or her documents. The personal dictionary can contain as many as 
1,500 words, or about 18K bytes. 

Many personal dictionaries can be created and stored on the 
same disk, provided there is sufficient storage space. Only one can be used 



...keystroke, choose one of six options: Correct as suggested, correct 
globally (throughout the document) , look at the next of previous 
suggestion, add the word to the personal dictionary so it will 
remain a permanent entry, bypass the word in this one instance or ignore 
the . . . 

.. .types "hear and there" will not be corrected since the misspelled words 
are, in fact, correctly spelled words in a different context . - 

However, CorrectStar 1 s ability to catch suspect words and suggest 
alternatives is impressive. For example, anyone who... 
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The Random House Proofreader. 

Meilach, D.Z. 

Interface Age, v8 , n5 , p24 
May, 1983 

ISSN: 0147-2992 LANGUAGE: ENGLISH RECORD TYPE: ABSTRACT 

. . .ABSTRACT: and can be used with Word Star and other word processing 
programs. This program will display misspelled words in context , list 
misspelled words , store a custom document dictionary and provide 
dictionary reference help for spelling correction. Program weaknesses 
include the dictionary help choices and some error detection features. 
Samples of the initial menu, word- flagging and the options... 
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MS-DOS FILE CONVERSION HIGHLIGHTS FEATURES OF PANASONIC'S 1991 DESKTOP WORD 
PROCESSORS 

News Release, pi 
Oct 2, 1991 

Language: English Record Type: Fulltext 
Document Type: Magazine/ Journal ; Trade 
Word Count : 665 

. . . ever to operate, Word Count, and Double Word 

Count, which identifies double words in the text. Features include 
insert , delete, word search, next word search, word 
replace, block 

move/delete/ copy , word wrap, format memory, automatic pagination and 
much more. 

In addition, both offer a number of other. . . 

. . .made with a touch of the 

button. Up to 120 words can also be added to a built -in user 
dictionary 

and up to 2,000 characters can be stored in Phrase Memory. 

(2) Spelling programs developed and copyrighted by Houghton Mifflin 
Company, publishers of The American Heritage Dictionary . 

Both word processors also feature built -in Thesaurus (3), with 
approximately 45,000 reference words and 500,000 synonyms. Move the 
cursor to . . . 
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JAKE GELLER'S SPELL CHECKING ENGINE (TM) ADDS SPELL CHECKING TO CLIPPER AND 
C APPLICATIONS 

News Release, pi 
Nov 1, 1990 

Language: English Record Type: Fulltext 
Document Type: Magazine /Journal ; Trade 
Word Count : 93 7 

... be a 

seamless part of existing applications." 

The Spell Checking Engine is supplied with an English language 
dictionary containing over 100,000 words. The dictionary 

is stored in 

a compressed format to minimize disk storage requirements and enhance 
lookup speed. 

A maintenance utility is provided which allows adding or removing 
custom words or abbreviations to the main dictionary . This is 
important to applications in medical, legal, scientific and other 
specialized fields. 

Auxiliary dictionaries can also be created 
for this purpose. Users 

can add their own words "on the fly, " and these additions can be 
incorporated into the main or auxiliary dictionaries if desired. 

The Spell Checking Engine provides flexibility to accommodate a broad 
spectrum of end-user computing. . . 

. . .optimized for 

speed, memory ut@ation or both, permitting spell checking even in low 
memory situations . 

The dictionary can be loaded into memory for optimal performance if 
sufficient free memory is available, or it can. . . 

. . .manner the 

developer chooses, e.g., popup box, bar list, etc. Suggestions can 
be tailored to the context of the word , including 

such factors as 

capit@ation. Unique features allow checking over an expanded 
alphabet, including alphabetic, numeric and special characters. 

Jake Geller's Spell Checking Engine sells for $24 9, including all 

dictionaries , support utilities and an extensive user manual. This 
price includes 25 licenses to distribute complete applications 
containing . . . 
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VOICE PROCESSING CORP 

Voice Technology News, v6 , n2 5, pN/A 

Dec 13, 1994 



Language: English Record Type: Fulltext 
Document Type: Magaz ine/ Journal ; Trade 
Word Count : 126 

{USE FORMAT 7 FOR FULLTEXT) 
TEXT: 

...because it can obtain the pronunciation of words used in applications 
from any combination of VProFlex's built -in 100,000 word phonetic 
dictionary of acoustic word' models, text- to- speech phonetic output using 
industry standard phonetic alphabets and application supplied "pronounce" 
statements. Other enhancements include word spotting, application 
specific context -free grammar and continuous input. VProFlex evaluation 
kits are available for qualified developers and a full software... 
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DIALOG (R) File 16: Gale Group PROMT (R) 
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08913540 Supplier Number: 77275800 (USE FORMAT 7 FOR FULLTEXT) 

Visa's Curriculum Site Brings Praise, Questions . (educational website 
supposedly promotes use of Visa credit cards) (Brief Article) 

Kingson, Jennifer A.; Kuykendall, Lavonne 
American Banker, vl66, nl59, p9 
August 17, 2001 

Language: English Record Type: Fulltext 
Article Type: Brief Article 
Document Type: Magazine/ Journal ; Trade 
Word Count : 8 56 

... to educators . 

According to the Web site, teachers praised 
www.practicalmoneyskills.com for, among other things, its glossary of 
banking and financial terms, its detailed lesson plans for preschool 
through 12th grade, and for "not... 

...curriculum does emphasize using the term "check card" -- which is Visa's 
particular name for a generic product , the debit card. In the glossary , 
there is a detailed definition of the term "check card" (which does not 
include the word debit) . Next to the term "debit card, 11 it says, "See 
' check card . 1 " 

The curriculum includes a 10-page lesson on shopping for... 
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Word Processing. (Microsoft Word) (Technology Tutorial) 

Campbell, George 

PC World, vl7, n4, p289(l) 

April, 1999 

Language: English Record Type: Fulltext 
Document Type: Magazine/ Journal ; General Trade 
Word Count: 1416 

. . . in the Macros list. 

4. Select &File from the Change What Menu drop-down list. 

5. Click Add , then click Close. 

Fix Errors In Word's Custom Dictionaries 

It happens to almost everyone: While rushing through a spelling 
check, you mistakenly add a misspelled (or mistyped) word to Word's custom 

dictionary . From then on, Word doesn't flag that misspelled word as 
incorrect, so it may go unnoticed. Fortunately, the custom dictionary 



is a simple text document that you can easily edit.- Here's how: 
1. Select Tools, Options... 

...6 and 7, click the Spelling tab. 

3. In Word 97, click Dictionaries. In Word '7, click Custom 
Dictionaries . Word 6 users can skip this step. 

4. Select CUSTOM .DIC or the dictionary you want to change in the 
Custom Dictionaries list. 

5. Click Edit, then click OK in the advisory dialog box that appears. 

6 . After Word. . . 

. . .you added erroneously. 

7. Once your edits are finished, select File, Save, then File, Close. 

8. In Word 7 and 97, you'll need to select Tools, Options, click 
the Spelling & Grammar tab, and enable. . . 

...prior to your edits. This is necessary because Word turns this feature 
off when you edit a custom dictionary . 

9. You can also add words to your custom dictionary if you want. 
Just type them in during Step 6 and save. 

Drag and Copy Text In. . . 
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03153221 Supplier Number: 44306420 
Software firm automates Japanese translations 

Mass High Tech, p!8 
Dec 20, 1993 

Language: English Record Type: Abstract 
Document Type: Magazine/ Journal ; Trade 

ABSTRACT : 

...terns of the probability of being correct. The program also learns the 
preferred usage of sentences or words in a certain context . Users may 
add all or parts of 19 separate special dictionaries for physics, 
chemistry, engineering, business, architecture or other applied sciences. 
The program also permits users to create custom dictionaries . 
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PANASONIC'S NEW LAPTOP WORD PROCESSOR IS SMALL IN SIZE ONLY 

News Release, pi 
June 21, 1990 

Language: English Record Type: Abstract 
Document Type: Magazine/ Journal ; Trade 

ABSTRACT: 

...word processor that brings new meaning to the word "personal." About the 
size of a standard hardcover dictionary , the KS-WL50 offers exceptional 
portability, and is perfect for use by travelling executives, by college 
students . . . 

...the KX-WL50 has full word processing capabilities built-in, including 
word search and replace, block move/ copy /delete, next word , word 
wrap, format memory and memory guantity. The unit also features Panasonic's 
63,000-word Accu-Spell Plus (TM) built -in dictionary , with 120 words 
user-programmable. ... 
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01006827 Supplier Number: 41098820 (USE FORMAT 7 FOR FULLTEXT) 
Canon's introduction at Winter Consumer Electronics Show next week will 
include 2 word processors, 2 personal copiers and fax phone 

Consumer Electronics, pN/A 
Jan 1, 1990 

Language: English Record Type: Fulltext 
Document Type: Newsletter; Trade 
Word Count : 103 

(USE FORMAT 7 FOR FULLTEXT) 
TEXT: 

Canon's introductions at Winter Consumer Electronics Show next week will 
include 2 word processors, 2 personal copiers and fax phone, all aimed 
at home office market. Canon BW 70 word... 

...LCD, 3.5", 720 K-byte floppy disc, capability for more than 20 
languages, 90,000-word built -in dictionary . Other model, TW 40 ($700 in 
Oct.), has same features but slower printing speed. Personal copiers, at... 

11/3, K/18 (Item 1 from file: 148) 
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Editing Word's dictionaries, (tips for using Microsoft's Word word 

processing software) (question-and-answer) (Product Support) (Brief 

Article) 

PC World, vl4, nil, p302 (1) 
Nov, 1996 

DOCUMENT TYPE: Brief Article ISSN: 0737-8939 LANGUAGE: English 

RECORD TYPE: Fulltext 

WORD COUNT: 151 LINE COUNT: 00014 

Li Yu, People's Republic of China 

Word stores its custom dictionary as an easy-to-edit ASCII file. 
Select Tools*Options , Chen click the Spelling tab in the Options dialog 
box. Choose the dictionary you want to edit from the Custom 
Dictionaries list and click Edit. Click Yes in the resulting dialog box, 
and click OK if asked to. . . 

. . . in the Options dialog box to close it, then make your changes in 
custom. die. Select File* Save , then File* Close . 

In Word 7, click the Custom Dictionaries button in the Options 
dialog box before choosing the dictionary you want to edit. 
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02912078 860468781 
Making Its Own Bed 
Stankevich, Debby Garbato 

Retail Merchandiser v40n5 PP : 83-85 May 2000 
ISSN: 1530-8154 JRNL CODE: DMD 
WORD COUNT: 2 770 

...TEXT: aggressive- including Wal-Mart, ShopKo and Kmart-Bed Bath & Beyond 
is not trying to put its whole store online tomorrow. "Our next step is 



to get word out to the 240 stores , " says Eisenberg. "We haven't seen it 
show up on shopping bags yet. We haven't put... 



...m. will have the finer points of anodized aluminum (cookware) explained. 
The customer rep even has a product guide glossary to assist." 

Unlike in its brick-and-mortar operation, the company is using a 
fulfillment center to. . . 
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02791469 696196021 
creative techniques 

Ba j aj , Geetesh 

Presentations vl8n9 PP : 16-17 Sep 2004 
ISSN: 1072-7531 JRNL CODE: PRS 
WORD COUNT : 13 02 

. . .TEXT: taskpane metaphor to spawn new capabilities. By default, the 
Research task pane can be used as a dictionary and thesaurus, and 
includes access to reference books and research sites. You can add even 
more reference. . . 

. . .2 The Research task pane produces a list of definitions from any of the 
several built-in dictionary versions. 

A defining resource 

Most PowerPoint users I surveyed turn to the Research task pane as a... 

. . .version of the North America Encarta Dictionary. For example, the 
Research task pane gave me six detailed definitions of the word 
visible, along with a text pronunciation (see EXAMPLE 2) . You can also 
switch to the United Kingdom version of the Encarta Dictionary , which 
showed me seven definitions for the same word . A search for the word 
technology got me three definitions with sample sentences using the word 

in all contexts . Several noun, adjective and adverb forms of the word 
were also displayed. Unfortunately, the task pane has no audio component to 
let you near how a word is pronounced. 

The Research task pane also includes several built-in thesauri. Searching 
within this pane, I found three English, two French and two Spanish. . . 
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01145690 CMP ACCESSION NUMBER: WIN19971115S0013 

Applications - From contact managers to spreadsheets and databases, these 
tips will help you stay organized and boost your productivity. 

WINDOWS MAGAZINE, 1997, n 811A, PG101 
PUBLICATION DATE: 971115 

JOURNAL CODE: WIN LANGUAGE: English 

RECORD TYPE: Fulltext 
SECTION HEADING: 2,001 Tips 
WORD COUNT: 21282 

... the paragraph style or styles you don't need. Click, on Delete, Yes 

and Close . 

Add/Use Glossary Entries 

Store frequently used text in a glossary entry. To create an 



entry, highlight the desired text and select Edit/Glossary, Type a name' in 
the Glossary Entry Name box and click on Create . To use a glossary 
entry, place the cursor where you want the text and select Edit/ Glossary 
. Choose the appropriate glossary entry name from the Glossary Entry 
Name box and click on Insert . 

Word Pro 97 

TASK. . . .SHORTCUT 

Delete next word . . . .Ctrl+Del 

Delete previous word . . . . Ctrl+Backspace 

Delete a row in a table. . . . Ctrl+- (minus key on numeric keypad) 
Center text. . . .Ctrl. . . 

. . .Alt+ 

Go To. . .-.Ctrl + 

Insert a row in a table .... Ctrl + + (plus key on numeric keypad) 
Insert glossary record. . . . Ctrl+ 

Redo act ion . . . . Alt+Shif t+Backspace or Ctrl+Shift+ 
Show/Hide set of Smartlcons . . . . Ctrl + 
Undo . . . 
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WordLogic Corporation Launches WordLogic Text Input Software for Wireless 
Information Devices at VORTEX 2001 

Business Wire 

Tuesday, May 22, 2001 09:08 EDT 

JOURNAL CODE: BW LANGUAGE: ENGLISH ' RECORD TYPE: FULLTEXT 
DOCUMENT TYPE: NEWSWIRE 
WORD COUNT: 613 

...is displayed onscreen and is compact and intuitive. With a tap of 

a letter, it predicts the next most probable letters and words . In two 

taps , 

most words can be entered . More complex words are completed with 
WordLogic 

Corporation's proprietary WordChunking (TM) technology. The WordLogic 
keyboard 

remembers common phrases and adapts to each user's vocabulary incorporating 
both standard and custom dictionaries . 
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DOCUMENT KNOWLEDGE MANAGEMENT APPARATUS AND METHOD 
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Total word count - document A 9344 9 

Total word count - document B 0 
Total word count - documents A + B 93449 
INTERNATIONAL PATENT CLASS: G06F-017/30 ... 



G06F-019/00 



. SPECIFICATION semantic dictionary creating step. According to the 
present program, each of the terms (keywords, abbreviations, synonyms, 
related words , etc. included in the dictionary) entered in the 
pre-existing dictionary information is assessed, based on the term, as 
being a canonical form. . . 

.A dictionary information processing program according to still another 
aspect of the present invention: wherein the semantic dictionary 
creating step further comprises a Web term appraising step that assesses, 
based on terms entered in a pre-existing dictionary information, 
whether each of the terms in the Web information is to be considered as a 
canonical form, variant form, or a term that is not to be used, and 
creates the semantic dictionary information from each term of the Web 
information, based on an appraisal result of the Web term appraising 
step . 



This is a more specific explanation of the semantic dictionary- 
creating step. According to the present program, each of the terms in 
pre-existing Web information (including. . . 



9/3,K/7 (Item 7 from file: 348) 

DIALOG (R) File 348: EUROPEAN PATENTS 
(c) 2005 European Patent Office. All rts . reserv. 



01507255 

Database model, tools and methods for organizing information across 

external information objects 
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Modele de base de donnees, outils et methode pour organiser des 
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ABSTRACT WORD COUNT: 187 
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Figure number on first page: 1 

LANGUAGE ( Publication, Procedural , Application) : English; English; English 
FULLTEXT AVAILABILITY: 

Available Text Language Update Word Count 
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Total word count - document A 13252 

Total word count - document B 0 

Total word count - documents A + B 13252 



INTERNATIONAL PATENT CLASS: G06F-017/30 



.SPECIFICATION limits, ranges excluding either or both of those included 
limits are also included in the invention. 

Unless defined otherwise, all technical and scientific terms used 
herein have the same meaning as commonly understood by one of ordinary 
skill in the art to which this invention belongs. Although... 

.noted that as used herein and in the appended claims, the singular forms 
"a", "and", and "the" include plural referents unless the context 
clearly dictates otherwise. Thus, for example, reference to "a viewer" 
includes a plurality of such viewers and. . . 

.publication provided may be different from the actual publication dates 
which may need to be independently confirmed. 

DEFINITIONS 

The term "activation" refers to enhancement of the effects of a 



biological agent or stimulation of a biological or chemical process, for 
example . 

The term "alternative" when used in the context of describing a 
biological story , refers to one choice among a number of possible 
explanations (or hypotheses) for a biological phenomenon. 

The . . . 

. . .note that may be associated with any item, collection, story element, 
diagram node, or diagram interaction. 

The term "biological story" defines a high-level description or 
explanation of a complex biological process, formulated by a researcher, 
for example ... 
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INTERNATIONAL PATENT CLASS: G06F-017/30 

...SPECIFICATION link to the "Global Slogan" document, he will see first 
the display in FIG. 13 (without a dictionary mode button, because the 
"Corporate Guidance" document is a hypertext document) , then the display 
in FIG. 14 (with a dictionary mode button, because the "Global Slogan" 
document is not a hypertext document) . 

If the user selects the dictionary mode button on the display in FIG. 
14, the linked document server 2 activates the dictionary linker 4, 



which generates a result file according to the look-up counts maintained 
by the dictionary access tabulator 18. The linked document server 2 
adds an ordinary mode button, and sends this file... 

..ten are used as described above. The first line in FIG. 61 is the 
ordinary mode button added by the linked document server 2. The next 
line contains only the word "We, " because the look-up count (one) for 
this word exceeds the threshold value (zero) for pronouns. The next 
line contains the word "draw" together with a dictionary look-up 
command tag and closing tag, because the look-up count (three) for this 
word does not exceed the threshold value (ten) for verbs. Other lines are 
generated similarly by the dictionary linker 4. The client device 1 
displays this result file as shown in FIG . 62. 
The user ... 

. .will obtain a Japanese definition as shown in FIG. 16 or 27. If the user 
selects the word "We, " however, no definition will be returned and 
the display in FIG. 62 will remain unchanged, because no dictionary 
access tag is attached to this word. 

By not underlining words that the user has already looked. . . 

..SPECIFICATION link to the "Global Slogan" document, he will see first 
the display in FIG. 13 (without a dictionary mode button, because the 
"Corporate Guidance" document is a hypertext document) , then the display 
in FIG. 14 (with a dictionary mode button, because the "Global Slogan" 
document is not a hypertext document) . 

If the user selects the dictionary mode button on the display in FIG. 
14, the linked document server 2 activates the dictionary linker 4, 
which generates a result file according to the look-up counts maintained 
by the dictionary access tabulator 18. The linked document server 2 
adds an ordinary mode button, and sends this file. . . 

..ten are used as described above. The first line in FIG. 61 is the 
ordinary mode button added by the linked document server 2. The next 
line contains only the word "We, " because the look-up count (one) for 
this word exceeds the threshold value (zero) for pronouns. The next 
line contains the word "draw" together with a dictionary look-up 
command tag and closing tag, because the look-up count (three) for this 
word does not exceed the threshold value (ten) for verbs'. Other lines are 
generated similarly by the dictionary linker 4 . The client device 1 
displays this result file as shown in FIG. 62. 
The user ... 

. .will obtain a Japanese definition as shown in FIG. 16 or 27. If the user 
selects the word "We," however, no definition will be returned and 
the display in FIG. 62 will remain unchanged, because no dictionary 
access tag is attached to this word . 

By not underlining words that the user has already Looked. . . 
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(AP) BW GH GM KE LS MW MZ NA SD SL SZ TZ UG ZM ZW 
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Detailed Description 



Detailed Description 

... of author, and for each of a plurality of sentences within the 
training text: identifying pairs of words , WI and W2 , with known 
contexts within a sentence, used together in conjunction, and designating 
matches between. . . 



. .yet further provided in accordance with a preferred 
embodiment of the present invention a method for resolving context 
ambiguity within a natural language ' sentence , including providing a 
plurality of context 
@5 -equivalon-co gr9LiR& 
Hith 



@p ip e 1 6 s 
q!@jf 
pai s of the ... 

..that are used in the same context, parsing a natural language sentence 
to identify grammatical types of words within the sentence, identifying 

context equivalence groups to which words within the sentence !0 
belong, a-ad resolving contexts of ambiguous words within the 
sentence, consistent with matches between the identified context 
equivalence groups, 

There is additionally provided in accordance with a preferred 
embodiment of the present- invention apparatus for resolving context 
ambiguity within a natural language sentence, including a memory for 
storing a plurality of i5 context equivalence groups, with specific 
pairs of the context equivalence groups designated as being matched, a 
context equivalence. . . 

..language sentence to identify grammatical types of WO 2005/022294 
PCT/US2004/021779 sentence belong, and resolving contexts of ambiguous 
words within the sentence consistent with matches between the identified 
context equivalence groups . 



The following definitions are employed throughout the 
specification and claims. 



I . Ambigg - more than Dne possible meaning for a word 

2 . Context Equivalence GroU, also Group - a group of words of a common 
Grammatical Type that can be used. . . 
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Detailed Description 
. . . Conversations 
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R. SPECIFICATION FOR SEMANTIC QUERY DEFINITIONS & VISUALIZATIONS FOR 
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.with particular words, or particular strings of several sentences or 
paragraphs . 

The reason for this is that words do not denote or connote meaning 
one to one as, for example, numerals tend to do. Put differently, certain 
meanings can be denoted or connoted by several different words or an 
essentially infinite combination of words , and, conversely, certain 
9 

words or combinations of words can denote or connote several different 
meanings. Despite this infinite many-to-many network of possibilities 
human beings can isolate (because of context , experience, reasoning, 
inference, deduction, judgment, learning and the like) isolate probable 
meanings, at least tolerably effectively most... 
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... an extremely fine-grained metric. Typically a database in today's 
marketing world has 8-16 demographic definitions . Now that more 
powerful machines are coming on line up to several hundred demographic 
categories are sometimes seen. With a typical database structure, adding 
a demographic definition directly affects performance measured in 



response time. The user profiling method allows for an actual and "able 
to be applied' demographic group of one definition per user or multiple 
millions with out affecting performance. The similarities and functional 
aspects of the individual demographic definitions can- easily be applied 
or associated with others in the group creating self -defining 
micro-demographic pathways or groups. Traditional demographic 
understanding can be extracted and applied within this process by 
defining groups of users using words , thereby creating a built in 
transition to legacy systems of consumer data, and a migration of 
capability. . . 

...Internet now includes corporations functioning in multi-lingual 
environments. The user profiling method has the ability to define 
words in a local context on an individual basis. This includes the 

ability to work in foreign and even multiple languages including symbolic 
languages. The data access and. . . 
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. . . For example, 

the merchant website cannot contain any content or 
subject matter that is offensive. Obviously, the term 
"of f ensivef f can have broad meaning . In this context , 
the term generally includes material that is typically 
referred to as pornographic, hateful, or demeaning of 
others. However, the standards reflected by this 
definition can be modified for the particular 



situation . 



The merchant database 120 is organized by listing 
associated goods. . . 
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. . . Next N Letters Is Below a Threshold 

A function can calculate the expected value of spelling the next N 
letters of a name, word- , or Many Definitions of Being Probably Stuck 
As the procedures above demonstrate, there are many ways of 
mathematically defining . that a speaker is probably stuck. The point is 
not the exact definition procedure used in a guide but that a 
reasonable probabilistic procedure is used at all. An example... 

...of names that match the abbreviation created thus far. 

b. Calculates the expected information value of the next N letters in 
all the words specified by the last word identifier entered . 

c. If the expected value is below a certain threshold, outputs a message 
telling the speaker that she will probably make little progress for the 
next N letters. 

Threshold Definitions of Being Stuck 

The procedures above for defining whether a speaker is stuck include 
threshold values. If... 



